Patch below implements avoiding modifying holdcnt if usecount is already > 0.
Note the current vget only bumps usecount after locking the vnode. Preserving this behavior means the caller may find itself with either usecount or only holdcnt and this information has to be conveyed, to that end I added another flag to lockmgr. Perhaps it would be nicer to just give a new argument to vget instead. Note the old idiom of passing calling vget with the interlock held is actually harmful for many consumers, e.g. the namecache can most of the time vhold directories without taking it (usecount bump which follows and may need the interlock is done much later after locking the vnode).
Another caveat is that in case of 2 or more threads doing vget it may be both of them will only bump holdcnt, which means one of them will have to backpedal on it.
Finally vputx has to unlock the vnode early in order to avoid unlocking it after dropping usecount. This would possibly race with someone else dropping holdcnt, which would violate the invariant that all VOPs execute with the vnode at least already held. I can add a comment.
Longer term it would probably be faster multithreaded to bump usecount prior to locking the vnode and provide some form of a barrier. This barrier would allow the vnode to settle (e.g. finish inactive processing). I think these considerations are for later.
I'm not strongly attached to names here -- this is roughly a wip in terms of committability, but works fine.
Sample benchmark results will be done later.