Page MenuHomeFreeBSD

vfs: introduce basic smr-protected path lookup
Needs ReviewPublic

Authored by mjg on Mar 1 2020, 11:55 PM.



The following was separated from a patchset which performs a fully-smr protected path lookup which only dirties the terminal vnode. This one only stops taking bucket locks for forward lookup. This already shows a significant speed up during concurrent stat calls on smaller scales.

The patch below will be split into two, handling namecache separately. It introduces vnode and namecache entry protection with smr.

The crux of the change is vhold_smr. An argument could be made that a flag should be set while dooming the vnode, but postponing this makes the code simpler, is harmless and opens up possibility of safely doing fetchadd + backpedaling from it in vhold_smr.

It's unclear to me what should be done here with smr accessors. It may be workable to make the hash use it, but it looks like some form of _raw accessors (if any) would be prudent here. Right now the code just asserts. I think the way to go for now is to hide all this behind dedicated macros which happen to be mere atomic load/store etc.

Diff Detail

Lint Skipped
Unit Tests Skipped

Event Timeline

mjg created this revision.Mar 1 2020, 11:55 PM
mjg edited the summary of this revision. (Show Details)Mar 1 2020, 11:56 PM
mjg updated this revision to Diff 69254.Mar 6 2020, 7:32 AM
  • hide vfs_smr behind macros
  • pretty print hold count flags

I think pending testing by pho this is committable.

mjg retitled this revision from vfs: introduce smr-protected path lookup to vfs: introduce basic smr-protected path lookup.Mar 6 2020, 7:33 AM
jeff added a comment.Mar 6 2020, 7:40 AM

Would you not prefer to use an invalid value rather than a bit to hold the free state? i.e. VHOLD_NO_SMR would be something like VHOLD_DEAD 0xffffffff. Some of your asserts would have to be adjusted. The first ref should switch dead with 1.


If you have ordering issues in later patches you can just manually smr_create() first thing and pass it around everywhere even here.


I had to solve this same problem for page busy. I did it slightly differently but related. This is VPB_FREED.

The notion is, the owner that is freeing has destroyed the page identity, and sets busy to VPB_FREED. The trybusy on the lockless side may succeed right before the owner sets FREED. The lockless lookup revalidates the page identity and unlocks if it races. The unlock is special because it handles FREED. We know that if we successfully acquire it, it can not be freed.

mjg added inline comments.Mar 6 2020, 8:03 AM

I know, but I did not find a good place to do it yet. It's plausible I'll be able to reorder the problematic consumer.

Another option is that perhaps there should be a well-defined "allocate global smrs here" place so that other work does not run into equivalent woes.


I noted what I think is an equivalent in the description.

  1. the code as implemented is imo very easy to understand.
  2. it maintains an assertable invariant that non-smr consumers never legally see the flag (modulo vdropl of course).
  3. it keeps the possiblity of still doing fetchadd.

To elaborate on the last point, when profiling things like the -j 104 bzImage incremental I kept seeing vhold_smr on the profile (despite it being only done on the terminal vnode). Note its performance is hindered by the fact that this is a cmpset loop in the first place. After the dust settles we can take a look at changing it into fetchadd, but this is only possible if fetchadding cannot legally alter any flags.
Basic idea is that we fetchadd and find the flag set, we can pretend we have never seen the object in the first place. This means that after vdrop wins the cmpset race it will still free the vnode, which will get the counter possibly modified after the call to uma_zfree_smr. This is a little hairy and I have ideas how to avoid doing anything with hold count in the first place (if we can avoid it) in the common case.

mjg added a comment.Mar 7 2020, 9:45 PM

Tested by pho. He ran into a zfs hang which seems unrelated (threads stuck waiting on i/o completion while the thread which should deliver it just sleeps in its main loop)