The current scheme of calling VOP_GETATTR adds avoidable overhead.
On Cascade Lake (no meltdown, mds and similar problems) Linux gets the following on tmpfs:
fstat1_processes -t 1 -s 10: average:9299441
fstat1_threads -t 1 -s 10: average:8291153
In contrast, FreeBSD on the same hardware:
fstat1_processes -t 1 -s 10: average:7488958
Patched:
fstat1_processes -t 1 -s 10: average:7913833
[there is no difference for threaded case]
Which mostly catches up to threaded case. Remaining overhead is copying more data and avoidable atomics.
I'll patch other filesystems (zfs and ufs) later.