Page MenuHomeFreeBSD

shortcuts in dmar_gas matching
ClosedPublic

Authored by dougm on Jan 28 2020, 3:47 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 19, 12:27 AM
Unknown Object (File)
Thu, Dec 12, 7:38 PM
Unknown Object (File)
Oct 1 2024, 3:25 PM
Unknown Object (File)
Sep 28 2024, 6:06 AM
Unknown Object (File)
Sep 17 2024, 12:44 PM
Unknown Object (File)
Sep 9 2024, 10:03 PM
Unknown Object (File)
Aug 18 2024, 4:30 PM
Unknown Object (File)
Aug 18 2024, 4:30 PM
Subscribers

Details

Summary

In dmar_gas_lowermatch, skip searching a subtree if all its addresses are greater than lowaddr.

In dmar_gas_uppermatch, skip searching a subtree if all its gaps-between-alloctions are too small.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Jan 28 2020, 4:44 PM

I probably won't be able to test this before tomorrow.

With "hw.dmar.enable=1" I see:

20200129 15:36:54 all (1/1): udp.sh
Expensive timeout(9) function: 0xffffffff80c5bbb0(0) 0.016073906 s
Expensive timeout(9) function: 0xffffffff80be2300(0xfffff80846234530) 0.024061505 s
Expensive timeout(9) function: 0xffffffff80c5bc80(0) 0.081290860 s
Expensive timeout(9) function: 0xffffffff80c5bc80(0) 0.231948206 s
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 07 7d 6b 62 00 00 40 00 
(da0:isci0:0:0:0): CAM status: CCB request completed with an error
(da0:isci0:0:0:0): Retrying command, 3 more tries remain
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1580935, size: 16384
isci: 1580309000:401318 ISCI Sending reset to device on controller 0 domain 0 CAM index 0
isci: 1580309000:410015 ISCI isci: bus=0 target=0 lun=0 cdb[0]=615539f0 terminated
isci: 1580309000:418124 ISCI isci: bus=0 target=0 lun=0 cdb[0]=5c5e9f0 terminated
isci: 1580309000:426163 ISCI isci: bus=0 target=0 lun=0 cdb[0]=3eae1f0 terminated
isci: 1580309000:434210 ISCI isci: bus=0 target=0 lun=0 cdb[0]=615671f0 terminated
isci: 1580309000:442363 ISCI isci: bus=0 target=0 lun=0 cdb[0]=5c461f0 terminated
isci: 1580309000:450402 ISCI isci: bus=0 target=0 lun=0 cdb[0]=41c0a9f0 terminated
isci: 1580309000:458552 ISCI isci: bus=0 target=0 lun=0 cdb[0]=5c471f0 terminated
isci: 1580309000:466601 ISCI isci: bus=0 target=0 lun=0 cdb[0]=6d50b1f0 terminated
isci: 1580309000:474741 ISCI isci: bus=0 target=0 lun=0 cdb[0]=6d50b9f0 terminated
(da0:isci0:0:0:0): WRITE(10). CDB: 2a 00 3e 25 fc aa 00 01 00 00 
(da0:isci0:0:0:0): CAM status: CCB request terminated by the host
(da0:isci0:0:0:0): Retrying command, 3 more tries remain

The lines added here were also in https://reviews.freebsd.org/D23189?id=66970 which passed this test, so I'm stumped at the moment.

After trying a bit harder with r357254 I see the same problem, so it is unrelated to this patch.

Before I commit this patch, can you determine if the problem was present before or after r357173?

I only have one UDP test that have triggered this.
On r357172 this test ran for 5 hours without any problems.
On r357173 I saw the problem after running the test for 30 minutes.

Subsequent testing by Peter appears to have absolved r357173 of blame, so I'm going ahead with this commit.

Subsequent testing by Peter appears to have absolved r357173 of blame, so I'm going ahead with this commit.

Do you mean that isci(4) was broken anyway ? Are any other drivers were reported as broken ?

In D23391#514739, @kib wrote:

Subsequent testing by Peter appears to have absolved r357173 of blame, so I'm going ahead with this commit.

Do you mean that isci(4) was broken anyway ? Are any other drivers were reported as broken ?

You can see exactly what Peter said at https://reviews.freebsd.org/D23435.

In D23391#514739, @kib wrote:

Subsequent testing by Peter appears to have absolved r357173 of blame, so I'm going ahead with this commit.

Do you mean that isci(4) was broken anyway ? Are any other drivers were reported as broken ?

I'm currently double checking this, by once again doing a commit search to see where the problem was introduced.
I only have one test: udp.sh which triggers the problem, and run time to reproduce the problem varies a lot.

I have now managed to trigger the problem in r357172, so this IMHO definitely exonerates r357273.

(da0:isci0:0:0:0): SCSI status: Check Condition
(da0:isci0:0:0:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred)
(da0:isci0:0:0:0): Field Replaceable Unit: 2
(da0:isci0:0:0:0): Retrying command (per sense data)
Feb  2 18:06:37 t2 kernel: pid 7384 (swap), jid 0, uid 1004, was killed: out of swap space

FreeBSD/amd64 (t2.osted.lan) (ttyu1)

login: root
Password:
Last login: Sun Feb  2 14:36:27 on ttyu1
FreeBSD 13.0-CURRENT (PHO) #0 r357172M: Sun Feb  2 17:54:56 CET 2020
:
:
root@t2:~ # uname -a
FreeBSD t2.osted.lan 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r357172M: Sun Feb  2 17:54:56 CET 2020     pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO  amd64
root@t2:~ # svnlite diff /usr/src/sys
Index: /usr/src/sys/vm/vm_fault.c
===================================================================
--- /usr/src/sys/vm/vm_fault.c  (revision 357172)
+++ /usr/src/sys/vm/vm_fault.c  (working copy)
@@ -1073,12 +1073,14 @@
                    fs->oom < vm_pfault_oom_attempts) {
                        fs->oom++;
                        vm_waitpfault(dset, vm_pfault_oom_wait * hz);
+               } else  {
+                       if (bootverbose)
+                               printf(
+               "proc %d (%s) failed to alloc page on fault, starting OOM\n",
+                                   curproc->p_pid, curproc->p_comm);
+                       vm_pageout_oom(VM_OOM_MEM_PF);
+                       fs->oom = 0;
                }
-               if (bootverbose)
-                       printf(
-"proc %d (%s) failed to alloc page on fault, starting OOM\n",
-                           curproc->p_pid, curproc->p_comm);
-               vm_pageout_oom(VM_OOM_MEM_PF);
                return (KERN_RESOURCE_SHORTAGE);
        }
        fs->oom = 0;
root@t2:~ # 
$

I have only observed problems with isci.