Page MenuHomeFreeBSD

search for bug in r357173
AbandonedPublic

Authored by dougm on Jan 31 2020, 6:34 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mar 18 2024, 5:05 PM
Unknown Object (File)
Mar 7 2024, 6:50 PM
Unknown Object (File)
Mar 7 2024, 11:17 AM
Unknown Object (File)
Jan 20 2024, 9:58 AM
Unknown Object (File)
Dec 20 2023, 3:08 AM
Unknown Object (File)
Oct 4 2023, 12:29 AM
Unknown Object (File)
Aug 24 2023, 10:00 PM
Unknown Object (File)
Aug 9 2023, 12:36 PM
Subscribers

Details

Reviewers
None
Summary

Undo some part of r357173, seeking to find what led a UDP test to fail. The first iteration undoes everything outside of sys/tree.h.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

20200131 09:22:31 all (1/1): udp.sh
Expensive timeout(9) function: 0xffffffff80c5c1c0(0) 0.011447899 s
Expensive timeout(9) function: 0xffffffff80c5c1c0(0) 0.035223522 s
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 3e 04 18 6a 00 01 00 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain
KDB: enter: Break to debugger
[ thread pid 11 tid 100003 ]
Stopped at      kdb_alt_break_internal+0x101:   movq    $0,0x1084d5c(%rip)
db> x/s version
version:        FreeBSD 13.0-CURRENT #0 r357318M: Fri Jan 31 09:01:40 CET 2020\012    pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012
db>

Undo those parts of r357173 made to sys/tree.h specifically intended to reduce code duplication in deletion, without undoing those parts specific to augmentation.

I ran a one hour test without seeing any problems.

Change nothing but the parts of tree.h related to deletion, restoring those to their original implementation. Hope to find that udp test still passes.

D23435.67596.diff fails straight away with

20200201 05:35:46 all (1/1): udp.sh
Expensive timeout(9) function: 0xffffffff80c5c2c0(0) 0.010081090 s
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 3e 82 fc 4a 00 01 00 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2125398, size: 8192
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1848888, size: 4096

I'll run some more tests on D23435.67555.diff, just to be sure.

More tests of D23435.67555.diff also show:

20200201 06:11:06 all (1/1): udp.sh
20200201 06:21:51 all (1/1): udp.sh
20200201 06:32:25 all (1/1): udp.sh
20200201 06:43:04 all (1/1): udp.sh
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 07 7d 6b 62 00 00 40 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain

This time, revert the iommu changes. Leave the tree changes alone.

Just to be extra sure, I ran a 2 hour test with r357318 and r357171 reverted. No problems seen there.

With D23435.67605.diff I see

20200201 09:35:36 all (1/1): udp.sh
Feb  1 09:35:58 t2 ntpd[2421]: error resolving pool 0.freebsd.pool.ntp.org: Name does not resolve (8)
Feb  1 09:37:12 t2 syslogd: last message repeated 1 times
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 3d 98 6e e2 00 01 00 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain

Redo. I left an extra AUGMENT hanging around. Undo iommu, undo delete, leave augment.

With D23435.67608.diff I see

20200201 11:32:32 all (1/1): udp.sh
Expensive timeout(9) function: 0xffffffff80c5c1f0(0) 0.029046322 s
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 07 7d 6b 62 00 00 40 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 763617, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 774266, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 775798, size: 61440
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 763579, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 774832, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 763617, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 774266, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 775798, size: 61440
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 763579, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 774832, size: 4096
isci: 1580553435:446627 ISCI Sending reset to device on controller 0 domain 0 CAM index 0
isci: 1580553435:455345 ISCI isci: bus=0 target=0 lun=0 cdb[0]=af91c1f0 terminated
isci: 1580553435:463481 ISCI isci: bus=0 target=0 lun=0 cdb[0]=af9ba1f0 terminated
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 3d f6 12 4a 00 00 80 00 
(da0:isci0:0:0:0): CAM status: CCB request terminated by the host
(da0:isci0:0:0:0): Retrying command, 3 more tries remain

Revert everything from the original patch except a comment, and a few places where null pointer checks had "!= NULL" added.

20200201 19:26:43 all (1/1): udp.sh
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 07 7d 6b 62 00 00 40 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 155318, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2494907, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6079959, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5729445, size: 53248
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5696154, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5696113, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6321657, size: 24576
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6251096, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 155318, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2494907, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6079959, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5729445, size: 53248
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5696154, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 5696113, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6321657, size: 24576
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 6251096, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2279816, size: 4096
isci: 1580582150:626672 ISCI Sending reset to device on controller 0 domain 0 CAM index 0
isci: 1580582150:635403 ISCI isci: bus=0 target=0 lun=0 cdb[0]=5c2c9f0 terminated
isci: 1580582150:643415 ISCI isci: bus=0 target=0 lun=0 cdb[0]=5c389f0 terminated
isci: 1580582150:651442 ISCI isci: bus=0 target=0 lun=0 cdb[0]=615969f0 terminated
isci: 1580582150:659605 ISCI isci: bus=0 target=0 lun=0 cdb[0]=f89e01f0 terminated
isci: 1580582150:667731 ISCI isci: bus=0 target=0 lun=0 cdb[0]=5c191f0 terminated
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 40 44 2a f2 00 00 80 00 
(da0:isci0:0:0:0): CAM status: CCB request terminated by the host

A complete undo. Not even comment and style changes get preserved. If this too leads to a test failure, then I consider r357173 absolved. If not, then I question my sanity.

I don't know. This also seems strange to me.
The latest patch also fails:

20200201 20:41:54 all (1/1): udp.sh
Expensive timeout(9) function: 0xffffffff80c5c300(0) 0.113305085 s
Feb  1 20:42:22 t2 ntpd[2407]: error resolving pool 0.freebsd.pool.ntp.org: Name does not resolve (8)
Feb  1 20:43:31 t2 syslogd: last message repeated 1 times
Feb  1 20:44:38 t2 syslogd: last message repeated 1 times
20200201 20:52:41 all (1/1): udp.sh
Feb  1 20:45:43 t2 syslogd: last message repeated 1 times
20200201 21:03:12 all (1/1): udp.sh
20200201 21:13:48 all (1/1): udp.sh
Feb  1 21:15:02 t2 kernel: pid 55566 (swap), jid 0, uid 1004, was killed: out of swap space
20200201 21:24:34 all (1/1): udp.sh
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 42 c3 53 12 00 01 00 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain

I' ll do some longer testing with r357173 reverted. That's the only thing I can think of.

I have now seen the problem with r357173 reverted.
So this explains nicely, I think, how your r357173 is not at fault.

20200201 21:52:30 all (1/1): udp.sh
Expensive timeout(9) function: 0xffffffff80e09ff0(0xfffff8009ad25000) 0.226813966 s
Feb  1 21:53:00 t2 ntpd[2419]: error resolving pool 0.freebsd.pool.ntp.org: Name does not resolve (8)
Feb  1 21:54:08 t2 syslogd: last message repeated 1 times
Feb  1 21:55:14 t2 syslogd: last message repeated 1 times
Feb  1 21:59:36 t2 syslogd: last message repeated 4 times
Feb  1 22:00:58 t2 ntpd[2419]: error resolving pool 0.freebsd.pool.ntp.org: Name does not resolve (8)
Feb  1 22:01:45 t2 syslogd: last message repeated 1 times
20200201 22:03:09 all (1/1): udp.sh
Feb  1 22:02:52 t2 syslogd: last message repeated 1 times
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 07 7d 6b 62 00 00 40 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain

Search abandoned. No evident bug here.