Page MenuHomeFreeBSD

Make reference counting more robust
AbandonedPublic

Authored by imp on May 16 2018, 3:50 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 20, 7:55 PM
Unknown Object (File)
Mar 4 2024, 2:12 PM
Unknown Object (File)
Dec 23 2023, 10:06 AM
Unknown Object (File)
Nov 15 2023, 10:08 PM
Unknown Object (File)
Nov 12 2023, 12:52 AM
Unknown Object (File)
Nov 4 2023, 5:43 AM
Unknown Object (File)
Oct 14 2023, 9:09 PM
Unknown Object (File)
Oct 10 2023, 11:51 PM
Subscribers
None

Details

Reviewers
chs
scottl
ken
mav
Summary

Currently, there's two races with the reference counting. One is a
race on the submission thread side. Prior to r333648 the refcount
protected against the completion from the xpt_action routine
triggering a wither and daclose before we could finish submitting and
reaquire the lock. This commit puts that back in place. However,
r333648 protects against another race that sometimes happens due to
bugs in GEOM where a request is submitted to the hardware and while
it's processing geom withers the device and calls daclose. There are
holes in GEOM's reference counting that allow this when I/O comes in
via geom_dev instead of geom_vfs (another commit will fix the known
hole where BIO_DELETEs are queued from ffs_blkfree via geom_dev
instead of geom_vfs which is how we saw this). While GEOM should be
perfect and not do this, at least one more hole still exists in
BIO_ZONE handling. To fail safe against this, and future GEOM bugs,
take out a second reference and release it in dadone like we did in
r333648.

Sponsored by: Netflix

Test Plan

similar changes would be needed in ada, nda and mda.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 16626
Build 16531: arc lint + arc unit

Event Timeline

imp added reviewers: chs, scottl, ken, mav.

It is OK to me. I still don't think protection against GEOM bugs belongs here, but so be it, since we any way have and acquire lock to protect that.

I would just not enumerate GEOM bugs in the commit message, if they are going to be fixed just by immediately following commits.

And may be reordered comments for refcount++ lines.

This revision is now accepted and ready to land.May 17 2018, 12:31 AM

OK. So it looked easy...

But if you look more closely, we're not really protecting things like we think and this is too racy. Opt for a different approach as done in D15517 where we keep outstanding counts reliably, but only detect issues in daclose rather than try to fix geom bugs. We'd have to do stupid locking to make second guessing GEOM's protection robust. mav was right about that all along.