Page MenuHomeFreeBSD

Eliminate pointless calls to vm_fault_prefault()
ClosedPublic

Authored by alc on Sep 30 2018, 5:29 PM.
Tags
None
Referenced Files
F106165214: D17367.id49700.diff
Thu, Dec 26, 11:22 AM
F106164785: D17367.id48586.diff
Thu, Dec 26, 11:12 AM
F106162575: D17367.id48597.diff
Thu, Dec 26, 10:20 AM
Unknown Object (File)
Wed, Dec 25, 10:08 PM
Unknown Object (File)
Tue, Dec 3, 9:15 AM
Unknown Object (File)
Nov 4 2024, 2:04 PM
Unknown Object (File)
Oct 23 2024, 4:28 AM
Unknown Object (File)
Oct 18 2024, 8:09 PM
Subscribers

Details

Summary

This change shows that 75% of the iterations performed within vm_fault_prefault() are pointless and could be eliminated by a two-line change to how we handle copy-on-write faults.

Test Plan

Here are some results for a parallel buildworld.

# /tmp/worldmark_j7_HEAD_counters
Sat Sep 29 18:56:23 CDT 2018
debug.counters.negmap: 18
debug.counters.nonneg: 35826
debug.counters.negati: 114882
vm.pmap.pde.promotions: 152
vm.pmap.pde.p_failures: 19
vm.pmap.pde.mappings: 3
vm.pmap.pde.demotions: 21
vm.reserv.reclaimed: 0
vm.reserv.partpopq: 
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1, 211548K,    114

vm.reserv.fullpop: 102
vm.reserv.freed: 2335
vm.reserv.broken: 0
18546.859u 1062.556s 57:26.81 568.9%    52927+3525k 52421+80938io 15365pf+0w
Sat Sep 29 19:53:50 CDT 2018
debug.counters.negmap: 90
debug.counters.nonneg: 6555245
debug.counters.negati: 19885568
vm.pmap.pde.promotions: 63267
vm.pmap.pde.p_failures: 9182
vm.pmap.pde.mappings: 186495
vm.pmap.pde.demotions: 4827
vm.reserv.reclaimed: 0
vm.reserv.partpopq: 
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1, 250360K,    163

vm.reserv.fullpop: 114
vm.reserv.freed: 1241731
vm.reserv.broken: 0
18542.221u 1067.328s 57:14.07 571.0%    52915+3525k 5267+80880io 0pf+0w
Sat Sep 29 20:51:04 CDT 2018
debug.counters.negmap: 90
debug.counters.nonneg: 12969217
debug.counters.negati: 39538213
vm.pmap.pde.promotions: 126439
vm.pmap.pde.p_failures: 18223
vm.pmap.pde.mappings: 373031
vm.pmap.pde.demotions: 9730
vm.reserv.reclaimed: 0
vm.reserv.partpopq: 
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1, 250308K,    162

vm.reserv.fullpop: 115
vm.reserv.freed: 2477894
vm.reserv.broken: 0

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc marked an inline comment as done.Sep 30 2018, 5:32 PM
alc added inline comments.
vm/vm_fault.c
1184–1186

The committed change would just be this one snippet with -1 changed to 1.

Can you upload the diff with context ?

alc marked an inline comment as done.Sep 30 2018, 6:23 PM
alc added inline comments.
vm/vm_fault.c
1184–1186

To be clear, the point of the faultcount == 0 test is to distinguish between faults that are simultaneously hard faults and copy-on-write faults, and those that are simply copy-on-write faults. In the former case, we still want to prefault the surrounding virtual pages that aren't be written to.

I would add a comment explaining this as part of the committed change.

vm/vm_fault.c
1295

I am confused. Does not setting faultcount to -1 changes the ahead/behind parameters passed to vm_fault_prefault() ? I.e., your current patch only aimed at counting hard-cow-faults, not to measure timings with the eliminated unneeded prefaults, right ?

alc marked an inline comment as done.Sep 30 2018, 7:44 PM
alc added inline comments.
vm/vm_fault.c
1295

No, changing zero to minus one has no effect on the ahead and behind values passed to vm_fault_prefault(). They are going to be PFFOR and PFBAK in either case.

Notice that I'm only changing faultcount to one when its value was still zero at the point where we performed the copy. If the copy-on-write fault was also a hard fault, then the faultcount would have already been updated to a non-zero value.

Essentially, these results argue that soft, copy-on-write faults shouldn't call vm_fault_prefault(). Faults of this type currently represent 75% of the work done in vm_fault_prefault(), and yield near zero mappings created. The mappings were already created by prefaulting of resident pages at execve time or by earlier hard faults.

This revision is now accepted and ready to land.Sep 30 2018, 7:56 PM

Could you guys try running with this change for a while so that we can see if other workloads actually create more useful mappings on soft, copy-on-write faults?

Peter, could you please test this patch? This patch is actually a diagnostic patch, not a fix, so I'm interested in seeing the output from "sysctl debug.counters" at the end of your tests.

I'll put this on my workstation tomorrow morning.

I'll put this on my workstation tomorrow morning.

After ~20 minutes of browsing and doing some work in a few terminals (but no builds), I see:

debug.counters.negmap: 6688
debug.counters.nonneg: 7201693
debug.counters.negati: 1053344

After running a "make index" in the ports tree (which spawns many long pipelines), I see:

debug.counters.negmap: 6985
debug.counters.nonneg: 10848239
debug.counters.negati: 68424347

I ran tests on two different hosts for 11 hours and got:

Mon Oct  1 17:17:03 CEST 2018
debug.counters.negmap: 11038668
debug.counters.nonneg: 3034528887
debug.counters.negati: 10814460156
vm.pmap.pde.promotions: 4172630
vm.pmap.pde.p_failures: 5155243
vm.pmap.pde.mappings: 15384
vm.pmap.pde.demotions: 40830
vm.reserv.reclaimed: 66557
vm.reserv.partpopq:
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1, 101648K,     50
     1,      -1,  93696K,     46

vm.reserv.fullpop: 198
vm.reserv.freed: 287519860
vm.reserv.broken: 66559
[pho@flix1a ~]$

Mon Oct  1 17:17:55 CEST 2018
debug.counters.negmap: 58054
debug.counters.nonneg: 742065807
debug.counters.negati: 1039443536
vm.pmap.pde.promotions: 2761682
vm.pmap.pde.p_failures: 16059695
vm.pmap.pde.mappings: 911081
vm.pmap.pde.demotions: 35005
vm.reserv.reclaimed: 48387
vm.reserv.partpopq:
DOMAIN    LEVEL     SIZE  NUMBER

     0,      -1,  24204K,     14

vm.reserv.fullpop: 2289
vm.reserv.freed: 29833419
vm.reserv.broken: 48388
[pho@mercat1 ~]$
alc marked 3 inline comments as done.Oct 21 2018, 7:15 PM

I *really* wish that vm_fault() knew whether the copy-on-write fault was caused by a protection violation or an invalid mapping. I think that a better heuristic for determining whether to call vm_fault_prefault() would be based on that knowledge. Specifically, I would call vm_fault_prefault() when the mapping was invalid.

Currently, I restrict calls to vm_fault_prefault() to hard faults, but hard faults are only a subset of the copy-on-write faults that occur on previously invalid mappings. For example, in the case of Mark's web browsing, I suspect that one Firefox process takes a hard copy-on-write fault on some memory-mapped file, and then another Firefox process takes a soft copy-on-write fault on that same page. I think that we should be calling vm_fault_prefault() on such soft copy-on-write faults, because the neighboring mappings are most likely invalid as well.

vm/vm_fault.c
1184–1185

To be clear, here is what I intend to commit.

Index: vm/vm_fault.c
===================================================================
--- vm/vm_fault.c       (revision 339445)
+++ vm/vm_fault.c       (working copy)
@@ -1181,7 +1181,17 @@ readrest:
                         */
                        vm_object_pip_wakeup(fs.object);
                        VM_OBJECT_WUNLOCK(fs.object);
+
                        /*
+                        * We only try to prefault read-only mappings to the
+                        * neighboring pages when this copy-on-write fault is
+                        * a hard fault.  In other cases, trying to prefault
+                        * is typically wasted effort.
+                        */
+                       if (faultcount == 0)
+                               faultcount = 1;
+
+                       /*
                         * Only use the new page below...
                         */
                        fs.object = fs.first_object;
In D17367#376752, @alc wrote:

I *really* wish that vm_fault() knew whether the copy-on-write fault was caused by a protection violation or an invalid mapping. I think that a better heuristic for determining whether to call vm_fault_prefault() would be based on that knowledge. Specifically, I would call vm_fault_prefault() when the mapping was invalid.

We can try to approximately calculate this information, but it is not reliable even on x86. For instance, we get a nonsensical error code for spurious page faults. But still, translating the bits from x86 exceptional error code into some additional information to vm_fault() is easy.

It might be that vm_fault() should be prepared to not get the hint about the fault cause at all.

In D17367#376853, @kib wrote:
In D17367#376752, @alc wrote:

I *really* wish that vm_fault() knew whether the copy-on-write fault was caused by a protection violation or an invalid mapping. I think that a better heuristic for determining whether to call vm_fault_prefault() would be based on that knowledge. Specifically, I would call vm_fault_prefault() when the mapping was invalid.

We can try to approximately calculate this information, but it is not reliable even on x86. For instance, we get a nonsensical error code for spurious page faults. But still, translating the bits from x86 exceptional error code into some additional information to vm_fault() is easy.

It might be that vm_fault() should be prepared to not get the hint about the fault cause at all.

Yes, it might be interesting to pass along PGEX_P to vm_fault(). Can you please elaborate on what error codes that you've seen from spurious page faults?

This revision was automatically updated to reflect the committed changes.

Unfortunately, this patch destabilized arm kernel.
The bad thing is that my Tegra does not panic, just hangs hard and I can’t even enter the JTAG debugger [1].
Moreover, this issue is very rare, I can do finish full buildkernel in most cases. The only known testcase is full
build of qt5-webengine (~16k of C++ files, +10 hours of compilation time).

I’m pretty sure that this change provoked some latent bug in arm pmap code, but due to nature of this bug,
I really need some pointer where to start.
Can you, please, give me some hints, what exactly does this change from pmap point of view?

[1] On Tegra, same hard hang can be caused, for example, by access to unimplemented register in device
MMIO, attempt to access to device which is in reset or have gated clock.

Simply, I have no idea where to start debugging this issue.

Thanks,
Michal

In D17367#391740, @mmel wrote:

Unfortunately, this patch destabilized arm kernel.
The bad thing is that my Tegra does not panic, just hangs hard and I can’t even enter the JTAG debugger [1].
Moreover, this issue is very rare, I can do finish full buildkernel in most cases. The only known testcase is full
build of qt5-webengine (~16k of C++ files, +10 hours of compilation time).

I’m pretty sure that this change provoked some latent bug in arm pmap code, but due to nature of this bug,
I really need some pointer where to start.
Can you, please, give me some hints, what exactly does this change from pmap point of view?

[1] On Tegra, same hard hang can be caused, for example, by access to unimplemented register in device
MMIO, attempt to access to device which is in reset or have gated clock.

Simply, I have no idea where to start debugging this issue.

The patch would cause some less calls to pmap_enter_quick(). Even that, only relevant to the user maps.
Since it reduces exposure of pages to userspace, I doubt that your description of the hang causes is telling.

Are you sure that this is the patch consequence, and not some other, either environmental or toolchain effect ? Eg. it might change kernel text layout to trigger something.

Only kernel was changed during bisecting, nothing else. And all kernels was cross-compiled by
same toolchain. Kernel before r339819 works, r339819 hangs, and fresh current with r339819
reverted also works. Also, during bisecting, any kernel after r339819 hangs.
So I thing that r339819 is culprit.

Please don't take hang issues list as complete, the JTAG debugger can halt CPU only on
instruction boundary (or on exception entry). It's unclear if there is any other reason why
CPU cannot finish instruction other that bus hang.

But I understand that this is strange, that why I asking for help :)