Page MenuHomeFreeBSD

Move vmmeter atomic counters into dedicated cache lines
ClosedPublic

Authored by mjg on Sep 8 2017, 9:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 16 2024, 4:00 AM
Unknown Object (File)
Dec 20 2023, 2:54 AM
Unknown Object (File)
Dec 15 2023, 9:51 PM
Unknown Object (File)
Dec 8 2023, 8:28 AM
Unknown Object (File)
Sep 26 2023, 6:33 PM
Unknown Object (File)
Sep 19 2023, 6:53 PM
Unknown Object (File)
Sep 6 2023, 6:50 PM
Unknown Object (File)
Sep 6 2023, 6:49 PM

Details

Summary

The current vmmeter struct is subject to extreme false sharing due to several atomic counters sharing cachelines. This severely reduces performance on fork/exec heavy workloads on high core boxes.

pig1 (westmere 4 * 10 * 2) doing -j 80 buildkernel on tmpfs goes from:
3599.82s user 1313.64s system 5485% cpu 1:29.58 total

to:
3613.17s user 1124.48s system 5500% cpu 1:26.13 total

i.e. 3 seconds less real time and ~1300 -> ~1100 system time.

The patch goes for the most trivial approach of moving relevant counters out and annotating them with __exclusive_cache_line.

I'm not attached to the new name prefix (vm_) nor names of padding fields. I just would like to see the split (or an equivalent) in.

Test Plan

The patch was already running for months on a 80-way box doing buildkernel, buildworld and poudriere runs.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Why can't you annotate the fields with __aligned(CACHE_LINE_SIZE), like:

#ifdef _KERNEL
#define VM_METER_ALIGNED __aligned(CACHE_LINE_SIZE)
#else
#define VM_METER_ALIGNED
#endif

struct vmmeter {
        ...
        u_int v_wire_count VM_METER_ALIGNED;
        u_int v_free_count VM_METER_ALIGNED;
        ...

struct vmmeter is not part of the userland ABI, and this way you don't have to touch
every bit of code that modifies each of the counters in question.

I don't really like the approach in the patch because there's nothing preventing code
from continuing to increment vm_cnt.v_foo even after you moved it into a separate
variable.

Annotating mid-struct has 2 problems:

  • creates unnecessary holes between stuff prior to and after the counter, i.e. the cacheline preceeding the padded field can have free space which is now unused. extreme case in point, v_inactive_target is getting sandwiched between 2 annotated fields (see below)
  • the annotation sets the alignment of the field itself, but does not prevent following fields from being placed immediately afterwards. once more case in point with v_inactive_target

i.e.

vmmeter with:

u_int v_wire_count VM_METER_ALIGNED;    /* (a) pages wired down */
u_int v_active_count VM_METER_ALIGNED;  /* (q) pages active */
u_int v_inactive_target; /* (c) pages desired inactive */
u_int v_inactive_count VM_METER_ALIGNED;        /* (q) pages inactive */
u_int v_laundry_count VM_METER_ALIGNED; /* (q) pages eligible for laundering */

ends up with v_inactive_target at offset 0xc4, false-sharing with active_count and not seeing the benefit of fitting with the rest of fields

If these counters have to remain in the struct, they can be moved to the very end. The __exclusive_cache_line annotation will make sure nothing follows the vm_cnt object either, i.e. the last counter will also be guaranteed to reside alone.

I don't really like the approach in the patch because there's nothing preventing code from continuing to increment vm_cnt.v_foo even after you moved it into a separate variable.

The original fields are no longer named and compilation is ought to fail. As for code which wont be recompiled, your approach suffers the same problem by definition - since the layout of the struct has changed, old offsets don't fit. Short of proper ABI compatibility detection on module load there is nothing which can be done here.

tl;dr I can move the counters to the very end if that's preferable. Leaving them mid-struct is problematic. I don't remember why but I was confident offsets of remaining elements must not change. If they can change, moving the fields to the very end + annotating vm_cnt is few lines. If they can't change for userspace, a little bit of ifdefy work will also do the trick. I can update this review with this approach if it seems acceptable.

Complete separation, while more sed churn, seems clearer to me as it splits lock-protected global state from what possibly can later become per-cpu.

No strong opinions one way or the other, apart from annotating fields in-place.

In D12281#255012, @mjg wrote:

The original fields are no longer named and compilation is ought to fail. As for code which wont be recompiled, your approach suffers the same problem by definition - since the layout of the struct has changed, old offsets don't fit. Short of proper ABI compatibility detection on module load there is nothing which can be done here.

Sorry, I somehow misread that part of the diff. I was referring to compile-time compatibility though.

tl;dr I can move the counters to the very end if that's preferable. Leaving them mid-struct is problematic. I don't remember why but I was confident offsets of remaining elements must not change. If they can change, moving the fields to the very end + annotating vm_cnt is few lines. If they can't change for userspace, a little bit of ifdefy work will also do the trick. I can update this review with this approach if it seems acceptable.

I don't think there's any problem with reordering the fields any more than there's a problem with introducing padding within struct vmmeter, especially now that struct vmmeter is no longer embedded in struct pcpu. I believe such a change would require a __FreeBSD_version bump.

Complete separation, while more sed churn, seems clearer to me as it splits lock-protected global state from what possibly can later become per-cpu.

No strong opinions one way or the other, apart from annotating fields in-place.

Updated the diff to only move counters to the end + pad them.

Sanity-checked performance did not change compared to the original patch.

alc added inline comments.
sys/sys/vmmeter.h
63 ↗(On Diff #32882)

Bruce will complain about this not being written as: #ifdef _KERNEL

:-)

This revision is now accepted and ready to land.Sep 10 2017, 6:00 PM

Have you evaluated the effects of isolating the free count? The fields surrounding it are read only.

-       u_int v_free_count;     /* (f) pages free */
        u_int v_inactive_target; /* (c) pages desired inactive */
        u_int v_pageout_free_min;   /* (c) min pages reserved for kernel */
        u_int v_interrupt_free_min; /* (c) reserved pages for int code */
        u_int v_free_severe;    /* (c) severe page depletion point */
+       u_int v_free_count VMMETER_ALIGNED;     /* (f) pages free */
        u_int v_wire_count VMMETER_ALIGNED; /* (a) pages wired down */

Made no difference in the buildkernel test, even with kib's page batching patch. Something of the sort will definitely have to be done later anyway and probably helps a little elsewhere, so I can include this bit in the patch if you want.

In D12281#255294, @mjg wrote:
-       u_int v_free_count;     /* (f) pages free */
        u_int v_inactive_target; /* (c) pages desired inactive */
        u_int v_pageout_free_min;   /* (c) min pages reserved for kernel */
        u_int v_interrupt_free_min; /* (c) reserved pages for int code */
        u_int v_free_severe;    /* (c) severe page depletion point */
+       u_int v_free_count VMMETER_ALIGNED;     /* (f) pages free */
        u_int v_wire_count VMMETER_ALIGNED; /* (a) pages wired down */

Made no difference in the buildkernel test, even with kib's page batching patch. Something of the sort will definitely have to be done later anyway and probably helps a little elsewhere, so I can include this bit in the patch if you want.

No, I was just curious. I'd rather wait until we have some experimental results that argue for the change.

I think it'd be logical to move v_free_count as well, even if it doesn't make a difference now. The fields around it are read-mostly and generally are accessed only during memory shortages, but that might not be true in the future.

This revision was automatically updated to reflect the committed changes.