Currently vfs likes to call ->vfs_root method of file systems, which causes very significant lock contention on mount-point heavy boxes with a lot of cores.
Since it is late in the release cycle and I think this should be addressed, I went for a trivial hack - store the vnode in the mount point and protect the access with rmlocks. The vnode is cleared on unmount and never repopulated. A better quality and long term fix will appear later. I'm not at all fond of this code and if you have a better idea (which can land in 12.0), I'm happy to drop this patch.
This significantly reduces system time and real time of -j 128 buildkernel on EPYC ( AMD EPYC 7601 32-Core Processor, 2 package(s) x 4 groups x 2 cache groups x 4 core(s) x 2 hardware threads) with the default partition setup (pasted at the end).
Before:
make -s -j 128 buildkernel 2778.09s user 3319.45s system 8370% cpu 1:12.85 total
After:
make -s -j 128 buildkernel 3199.57s user 1772.78s system 8232% cpu 1:00.40 total
Lock profile (total wait time) before:
5808005606 (sx:zfsvfs->z_hold_mtx[i])
363396408 (lockmgr:zfs)
261479733 (rw:vm object)
203987276 (sx:dd->dd_lock)
114277109 (spin mutex:turnstile chain)
71652435 (sx:vm map (user))
56534812 (spin mutex:sleepq chain)
44072442 (sx:dn->dn_mtx)
39450115 (sleep mutex:zio_write_issue)
28089280 (sleep mutex:vnode interlock)
after:
411035342 (lockmgr:zfs)
348574707 (sleep mutex:struct mount mtx)
285654874 (rw:vm object)
183426131 (sx:dd->dd_lock)
111464752 (spin mutex:turnstile chain)
110790531 (sleep mutex:vnode interlock)
74655012 (sx:vm map (user))
72533018 (spin mutex:sleepq chain)
46086874 (sx:dn->dn_mtx)
45494908 (sx:zp->z_acl_lock)
partition setup:
zroot/ROOT/default on / (zfs, local, noatime, nfsv4acls)
devfs on /dev (devfs, local, multilabel)
zroot/tmp on /tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/usr/home on /usr/home (zfs, local, noatime, nfsv4acls)
zroot/usr/ports on /usr/ports (zfs, local, noatime, nosuid, nfsv4acls)
zroot/usr/src on /usr/src (zfs, local, noatime, nfsv4acls)
zroot/var/audit on /var/audit (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/crash on /var/crash (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/log on /var/log (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/mail on /var/mail (zfs, local, nfsv4acls)
zroot/var/tmp on /var/tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot on /zroot (zfs, local, noatime, nfsv4acls)