Page MenuHomeFreeBSD

vfs: fix incorrect hold in vtryrecycle
Changes PlannedPublic

Authored by mjg on Sep 15 2019, 8:02 PM.

Details

Reviewers
kib
Summary

This is a follow up to the e-mail thread about vtryrecycle.

It incorrectly holds a vnode and allows the usecount to grow and if it happens it aborts recycle leaving the other thread with an incorrectly held vnode. The fix is to do everything with the vnode interlock held. For this to happen the following provisions were added:

  • VOP_LOCK can now ignore the interlock
  • vn_start_write_nb is added. it guarantees no locks will be taken. Note this effectively depends on the switch to atomics (or per-cpu) as trylocking on the mnt interlock would be very likely to fail due to contention. This also changes semantics a little bit in that non-suspendable mount points get returned with a reference held. This can be augmented later to avoid refing them in the first place OR dropping the opt out altogether, now that the bottleneck is fixed. Semantics change is needed since by the time we determine the mount point is not suspendable per-cpu operation can be disabled and dropping the ref would require taking the lock.

The same treatment can be made in vlrureclaim to avoid the need to re-check anything.

Also note the treatment in unionfs already looks incorrect due to the missing interlock in the common case. For nonblocking case we go in with the interlock (can add pre/post assert routines for it).

Test Plan

will ask pho, passes suj11 for me

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

mjg created this revision.Sep 15 2019, 8:02 PM
mjg edited the summary of this revision. (Show Details)Sep 15 2019, 8:03 PM
mjg edited the summary of this revision. (Show Details)
mjg edited the summary of this revision. (Show Details)
mjg updated this revision to Diff 62132.Sep 15 2019, 8:11 PM
  • vdropl before vn_finished_write
kib added a comment.Sep 17 2019, 8:54 AM

Why do you need vn_start_write_nb() when vn_start_write() already takes the flag argument ?

mjg updated this revision to Diff 62201.Sep 17 2019, 9:12 AM
  • rebase
mjg updated this revision to Diff 62202.Sep 17 2019, 9:19 AM
  • plug a blatant mp ref leak
mjg added a comment.Sep 17 2019, 9:20 AM

I think it's cleaner to handle this separately mostly because this function cannot do any clean up so it behaves quite differently compared to vn_start_write. If you insist I can fold it in.

kib added a comment.Sep 17 2019, 2:27 PM
In D21667#472964, @mjg wrote:

I think it's cleaner to handle this separately mostly because this function cannot do any clean up so it behaves quite differently compared to vn_start_write. If you insist I can fold it in.

I highly dislike growing the VFS interface in ah-hoc manner with the only cause of fixing some self-inflicted bug. Please use a flag to select this behavior, the flag might as well just call into the static vn_start_write_nb().

mjg updated this revision to Diff 62214.Sep 17 2019, 3:24 PM
  • make vn_start_write_nb static
  • add a comment explaining error handling
  • assert LK_NOWAIT passed if LK_IGNORE_INTERLOCK is used
kib accepted this revision.Sep 17 2019, 8:19 PM
kib added inline comments.
sys/kern/vfs_subr.c
1441

There should be no space before label.

sys/kern/vfs_vnops.c
1672

However

1678

!= 0 there and line below.

This revision is now accepted and ready to land.Sep 17 2019, 8:19 PM
mjg planned changes to this revision.EditedOct 6 2019, 10:40 PM

Turns out there is a bug with in handling of nullfs and unionfs. Their routines may end up taking the interlock. Taking care of nullfs is easy, but unionfs requires a little more work. Found by pho.