vfs cache: describe various optimization ideas
While here report a sample result from running on Sapphire Rapids:
An access(2) loop slapped into will-it-scale, like so:
while (1) {
int error = access(tmpfile, R_OK);
assert(error == 0);
(*iterations)++;
}.. operating on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c
In operations per second:
lockless: 3462164
locked: 1362376
While the over 3.4 mln may seem like a big number, a critical look shows
it should be significantly higher.
A poor man's profiler, counting how many times given routine was sampled:
dtrace -w -n 'profile:::profile-4999 /execname == "a.out"/ {
@[sym(arg0)] = count(); } tick-5s { system("clear"); trunc(@, 40);
printa("%40a %@16d\n", @); clear(@); }'
[snip]
kernel`kern_accessat 231
kernel`cpu_fetch_syscall_args 324
kernel`cache_fplookup_cross_mount 340
kernel`namei 346
kernel`amd64_syscall 352
kernel`tmpfs_fplookup_vexec 388
kernel`vput 467
kernel`vget_finish 499
kernel`lockmgr_unlock 529
kernel`lockmgr_slock 558
kernel`vget_prep_smr 571
kernel`vput_final 578
kernel`vdropl 1070
kernel`memcmp 1174
kernel`0xffffffff80 2080
0x0 2231
kernel`copyinstr_smap 2492
kernel`cache_fplookup 9246