There are a few spots in which unionfs_lookup() accesses unionfs vnode
private data without holding the corresponding vnode lock or interlock.
Details
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 56880 Build 53768: arc lint + arc unit
Event Timeline
These changes plug holes indeed.
Side note: It's likely that I'll rewrite the whole lookup code at some point, so I'll have to test again for races. The problem with this kind of bugs is that they are triggered only by rare races. We already have stress2 which is great, but also relies on "chance". This makes me think that perhaps we could have some more systematic framework triggering vnode dooming, let's say, at unlock. I'll probably explore that at some point.
I've wished for that multiple times as well; something like a "VFSan", although I do wonder if some of this functionality could be integrated with an existing SAN like KMSan.
In a perfect (imaginary) world, the ideal thing to do would be to simplify the VFS so that filesystems don't have to do so much extra work to handle the possibility of a concurrent forced unmount in the first place, but I'm not sure how that would be done. It brings to mind something like SMR or epoch(9) wherein vnode dispatch would enter something like a "VFS epoch" and forcible unmount would wait for the epoch to drain, but the mechanism would have to be nestable, sleepable, and still allow forced unmount to work in a reasonable amount of time, which makes it sound roughly like magic.