Page MenuHomeFreeBSD

vfs: fix incorrect hold in vtryrecycle
AbandonedPublic

Authored by mjg on Sep 15 2019, 8:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 12:07 PM
Unknown Object (File)
Wed, Jan 1, 10:58 AM
Unknown Object (File)
Tue, Dec 31, 12:42 PM
Unknown Object (File)
Mon, Dec 30, 11:26 AM
Unknown Object (File)
Sun, Dec 29, 11:05 AM
Unknown Object (File)
Sat, Dec 28, 12:16 PM
Unknown Object (File)
Fri, Dec 27, 8:34 AM
Unknown Object (File)
Wed, Dec 25, 11:37 PM
Subscribers

Details

Reviewers
kib
Summary

This is a follow up to the e-mail thread about vtryrecycle.

It incorrectly holds a vnode and allows the usecount to grow and if it happens it aborts recycle leaving the other thread with an incorrectly held vnode. The fix is to do everything with the vnode interlock held. For this to happen the following provisions were added:

  • VOP_LOCK can now ignore the interlock
  • vn_start_write_nb is added. it guarantees no locks will be taken. Note this effectively depends on the switch to atomics (or per-cpu) as trylocking on the mnt interlock would be very likely to fail due to contention. This also changes semantics a little bit in that non-suspendable mount points get returned with a reference held. This can be augmented later to avoid refing them in the first place OR dropping the opt out altogether, now that the bottleneck is fixed. Semantics change is needed since by the time we determine the mount point is not suspendable per-cpu operation can be disabled and dropping the ref would require taking the lock.

The same treatment can be made in vlrureclaim to avoid the need to re-check anything.

Also note the treatment in unionfs already looks incorrect due to the missing interlock in the common case. For nonblocking case we go in with the interlock (can add pre/post assert routines for it).

Test Plan

will ask pho, passes suj11 for me

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

mjg edited the summary of this revision. (Show Details)
mjg edited the summary of this revision. (Show Details)
  • vdropl before vn_finished_write

Why do you need vn_start_write_nb() when vn_start_write() already takes the flag argument ?

  • plug a blatant mp ref leak

I think it's cleaner to handle this separately mostly because this function cannot do any clean up so it behaves quite differently compared to vn_start_write. If you insist I can fold it in.

In D21667#472964, @mjg wrote:

I think it's cleaner to handle this separately mostly because this function cannot do any clean up so it behaves quite differently compared to vn_start_write. If you insist I can fold it in.

I highly dislike growing the VFS interface in ah-hoc manner with the only cause of fixing some self-inflicted bug. Please use a flag to select this behavior, the flag might as well just call into the static vn_start_write_nb().

  • make vn_start_write_nb static
  • add a comment explaining error handling
  • assert LK_NOWAIT passed if LK_IGNORE_INTERLOCK is used
kib added inline comments.
sys/kern/vfs_subr.c
1441

There should be no space before label.

sys/kern/vfs_vnops.c
1672

However

1678

!= 0 there and line below.

This revision is now accepted and ready to land.Sep 17 2019, 8:19 PM
mjg planned changes to this revision.EditedOct 6 2019, 10:40 PM

Turns out there is a bug with in handling of nullfs and unionfs. Their routines may end up taking the interlock. Taking care of nullfs is easy, but unionfs requires a little more work. Found by pho.

Fixed as a side effect of D22997