If we release the last usecount we take ownership of the hold count, which means the vnode will remain allocated until we vdrop it. If someone else vrefs they will find no usecount and will proceed to add their own hold count. These 2 facts mean we can fetchadd instead of having a cmpset loop.
In a trivial microbenchmark on tmpfs doing stat on different files with full path lookup to /tmp/file* with 104 threads this gives me a bump from ~178k ops/s to ~213k ops/s. Note that a mere benchmark of the sort instantly runs into contention on the vnode list due to constant movement between active and free list. I circumvented that by tail -f tmp/* which activated all leaf vnodes. This problem will be addressed elsewhere.
Diff was generated on top of D21525 for simplicity, but the idea does not depend on it and I can write it against stock -current. None of the code acting on usecount of 0 seems to mind the value transitioning from not-0 to 0 while the interlock is held.