Page MenuHomeFreeBSD

Eliminate false sharing in malloc due to statistic collection
ClosedPublic

Authored by mjg on Sep 22 2018, 1:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 1, 1:06 PM
Unknown Object (File)
Fri, Oct 18, 4:24 AM
Unknown Object (File)
Thu, Oct 17, 1:52 PM
Unknown Object (File)
Sep 25 2024, 8:19 PM
Unknown Object (File)
Sep 24 2024, 3:42 AM
Unknown Object (File)
Sep 22 2024, 1:34 PM
Unknown Object (File)
Sep 22 2024, 10:00 AM
Unknown Object (File)
Sep 22 2024, 12:52 AM
Subscribers

Details

Summary

Currently stats are collected in a MAXCPU-sized array which is not aligned and suffers enormous false-sharing. Fix the problem by utilizing per-cpu allocation.

The counter(9) API is not used here as it is too incomplete and does not provide a win over per-cpu zone sized for malloc stats struct. In particular stats are being reported for each cpu separately by just copying what is supposed to be an array element for given cpu.

malloc_type_stats has 3 uint64_t-sized fields in there for padding (against other cpus - the struct is 64 bytes of size, but the array consisting of these structs is not aligned) which I left in place to simplify stat reporting. I can declare malloc_type_stats_export or similar + populate it appropriately + export that. Then the waste can be removed. Note the patch already provides savings: there are mp_maxid + 1 elements, not MAXCPU. Stat collection for uma zones still suffers the problem and will require a different fix.

The sharing is very visible on Skylake.

lock1 benchmark (48-way) of the will-it-scale suite:

min:164096 max:1070674 total:30436206
min:180144 max:1081428 total:30055726
min:156210 max:1043012 total:29958074
min:126894 max:1080280 total:30590810
min:121228 max:1089982 total:30995994
min:130596 max:1125478 total:31431682


%SAMP IMAGE      FUNCTION             CALLERS
 21.6 kernel     lf_advlockasync      lf_advlock
 21.2 kernel     malloc               lf_advlockasync
  9.5 kernel     lf_free_lock         lf_advlockasync:5.5 lf_activate_lock:4.0
  8.1 kernel     lock_delay           _sx_xlock_hard
  7.2 kernel     uma_zalloc_arg       malloc
  6.1 kernel     uma_zfree_arg        free
  5.7 kernel     _sx_xlock_hard       lf_advlockasync:2.9 lf_free_lock:2.8
  5.0 kernel     free                 lf_free_lock
  1.9 kernel     kern_fcntl           kern_fcntl_freebsd
  1.8 kernel     Xfast_syscall
  1.7 kernel     fget_unlocked        kern_fcntl
  1.4 kernel     copyin_smap_erms     kern_fcntl_freebsd
  1.4 kernel     amd64_syscall
  1.0 kernel     kern_fcntl_freebsd   amd64_syscall
  1.0 kernel     cpu_set_syscall_retv amd64_syscall
  0.8 libc.so.7  0x12fb4a             testcase
  0.5 kernel     sleepq_lock          wakeup
  0.5 kernel     VOP_ADVLOCK_APV      kern_fcntl

patched:

min:189106 max:1738522 total:38471938
min:214540 max:1766486 total:38174320
min:198314 max:1737370 total:38208790
min:187510 max:1721256 total:37609386
min:206284 max:1719908 total:37766580

%SAMP IMAGE      FUNCTION             CALLERS
 25.9 kernel     lf_advlockasync      lf_advlock
 11.1 kernel     lock_delay           _sx_xlock_hard
 10.1 kernel     uma_zalloc_arg       malloc
  9.5 kernel     lf_free_lock         lf_advlockasync:5.0 lf_activate_lock:4.4
  9.1 kernel     uma_zfree_arg        free
  4.5 kernel     _sx_xlock_hard       lf_advlockasync:2.5 lf_free_lock:2.0
  3.3 kernel     Xfast_syscall
  3.2 kernel     kern_fcntl           kern_fcntl_freebsd
  2.9 kernel     fget_unlocked        kern_fcntl
  2.6 kernel     copyin_smap_erms     kern_fcntl_freebsd
  2.3 kernel     amd64_syscall
  2.1 kernel     free                 lf_free_lock
  1.9 kernel     malloc               lf_advlockasync
  1.4 libc.so.7  getdiskbyname        _init
  1.2 kernel     kern_fcntl_freebsd   amd64_syscall
  0.9 kernel     VOP_ADVLOCK_APV      kern_fcntl
  0.8 kernel     cpu_fetch_syscall_ar amd64_syscall
  0.7 kernel     sleepq_lock          wakeup
  0.6 kernel     cpu_set_syscall_retv amd64_syscall
  0.6 liblzma.so lzma_filter_flags_en _init
  0.6 kernel     lf_activate_lock     lf_advlockasync

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

mjg edited the summary of this revision. (Show Details)
markj added inline comments.
sys/kern/kern_malloc.c
67 ↗(On Diff #48353)

Style: this should be sorted.

958 ↗(On Diff #48353)

It'd be nice to use a consistent zone name, e.g., "mt_stats_zone". Or rename "mt_zone" to "mt" and call the new one "mt_stats".

1002 ↗(On Diff #48353)

Style: missing spaces around '|'.

1049 ↗(On Diff #48353)

This brace is on the wrong line.

1131 ↗(On Diff #48353)

"CPUs" should be capitalized like in the comment above. There should be a newline before the comment.

This revision is now accepted and ready to land.Sep 22 2018, 4:01 PM
This revision was automatically updated to reflect the committed changes.