Switch the VFS cache to use an rmlock for locking. In a simple
test using buildworld/buildkernel on a 12-core Haswell Xeon system,
this lead to approximately a 10% performance increase:
x orig.log
+ rmlock.log
+------------------------------------------------------------------------------+
+ x | |||
++++ x x xxx | |||
A | _________A___M____ | ||
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 6 2710.31 2821.35 2816.75 2798.0617 43.324817
+ 5 2488.25 2500.25 2498.04 2495.756 5.0494782
Difference at 95.0% confidence
-302.306 +/- 44.4709 -10.8041% +/- 1.58935% (Student's t, pooled s = 32.4674)
More importantly is a correctness issue: I recently hit a case on an
single core VM that was overloaded by user priority threads consuming
the CPU. A real-time priority process got blocked waiting to acquire
a write lock on the VFS cache and was starved for over 30 seconds
because of a priority inversion problem where a reader held the VFS
cache lock but was preempted by the CPU hogs. The rmlock is able to
propagate priority even to readers and therefore avoids the problem.