Details

Reviewers

kib
olce

Commits

rGfe7fe3b175b6: rangelock: Fix handling of trylocks

Summary

When inserting a queue entry, i.e., locking a range, there are two
points where a trylock operation may fail, one before the new entry is
inserted, one after. In the latter case, rl_(r|w)_validate() would mark
the entry and rangelock_lock_int() would free it. However, this is of
course incorrect, since the entry is visible to other threads, which
will eventually attempt to remove it and free it again.

Factor out conflict handling in rl_(r|w)_validate() to a common function
as they are functionally the same. Then, introduce a new result which
indicates that a trylock failed but that the queue entry must not be
cleaned up.

While here, assert that a conflicting range isn't owned by the current
thread, as that would indicate a bug in the consumer.

Reported by: syzkaller
Fixes: 9ef425e560a9 ("rangelocks: add fast cheating mode")

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

markj created this revision.Fri, Mar 21, 2:09 PM

Herald added a subscriber: imp. · View Herald TranscriptFri, Mar 21, 2:09 PM

markj requested review of this revision.Fri, Mar 21, 2:09 PM

Harbormaster completed remote builds in B63024: Diff 152496.Fri, Mar 21, 2:09 PM

kib accepted this revision.Fri, Mar 21, 11:53 PM

kib added inline comments.

sys/kern/kern_rangelock.c
702–703	Can't this block (for the case r == 0) also use rl_conflict()?

This revision is now accepted and ready to land.Fri, Mar 21, 11:53 PM

Use rl_conflict() in the case where a conflict is found before insertion.

This revision now requires review to proceed.Sat, Mar 22, 1:22 AM

Harbormaster completed remote builds in B63044: Diff 152523.Sat, Mar 22, 1:22 AM

kib accepted this revision.Sat, Mar 22, 1:27 AM

This revision is now accepted and ready to land.Sat, Mar 22, 1:27 AM

olce added inline comments.Mon, Mar 24, 11:14 AM

sys/kern/kern_rangelock.c
538–539	`cur` must be tested for validity here. `e` can never be marked.
542–544	I would split this assertion, as part of what it covers is in fact not a rangelock-internal programming error. If `inserted` is true, then indeed the proposition `cur->rl_q_owner != curthread` is true necessarily, as the current thread wouldn't have inserted the new entry in the first place since it would have detected a conflicting entry from itself before. However, if `inserted` is false, the proposition can be false if the same thread issues conflicting requests. This either indicates an (external) programming error or a limitation of the interface, and in this case we should maybe prefer returning some particular value so that, in the end, the external caller gets `NULL` (indicating failure), or else call a plain `panic()`.

markj marked an inline comment as done.Mon, Mar 24, 1:09 PM

markj added inline comments.

sys/kern/kern_rangelock.c
538–539	Thank you.
542–544	In the case of an external programming error, we can't necessarily signal an error to the external caller: some rangelock operations don't fail. We could pass NULL back nonetheless, but that would require some extra complexity and isn't very satisfying. We could explicitly panic. With this patch, in a non-INVARIANTS kernel we would go to sleep forever instead, a less destructive reaction. It's not really clear to me that panicking is preferable. I agree with your observation, but I prefer to keep using KASSERT() here. It feels odd to split one assertion into two which effectively check the same condition. A gdb backtrace would also reveal which case we're dealing with; maybe a comment would be sufficient?
702–703	It requires some tweaking since here `e` is not yet present in the queue, but this is probably a good idea.

Fix the check for a conflict with a marked entry.

This revision now requires review to proceed.Mon, Mar 24, 1:09 PM

Harbormaster completed remote builds in B63095: Diff 152615.Mon, Mar 24, 1:09 PM

kib accepted this revision.Mon, Mar 24, 1:28 PM

This revision is now accepted and ready to land.Mon, Mar 24, 1:28 PM

olce added inline comments.Mon, Mar 24, 1:31 PM

sys/kern/kern_rangelock.c
542–544	We could pass NULL back nonetheless, but that would require some extra complexity and isn't very satisfying. We could pass `NULL` as `rl_insert()` is only called by the `rangelock_lock()` functions, which all return a cookie (`void `), and `rl_conflict()` is only called by `rl_insert()` (directly or indirectly). That said, I agree it doesn't seem useful at this point. What would be the callers of the non-try versions supposed to do then? If this is re-trying, then the callee should do it itself anyway, so perhaps the most logical "solution" would be to wait forever (but see addition below). We could explicitly panic. With this patch, in a non-INVARIANTS kernel we would go to sleep forever instead, a less destructive reaction. It's not really clear to me that panicking is preferable. In hindsight, I agree that panicking is probably too extreme here, and waiting forever quite logical. That said, if we know we are going to sleep forever, I think printing so to the console (with minimal details, such as the blocked thread ID) would still be useful to immediately be informed that there is a problem in the non-INVARIANTS case. I agree with your observation, but I prefer to keep using KASSERT() here. It feels odd to split one assertion into two which effectively check the same condition. A gdb backtrace would also reveal which case we're dealing with; maybe a comment would be sufficient? A comment would be great, as well as some printing that the thread is blocked forever in the non-INVARIANTS case.

(reformulated) The logic of this patch seems sound to me, so validating it. IMO, it would be improved by following the above suggestions.

Add a comment to explain the assertion a bit better.

I prefer not to preemptively add code which tries to "help" with such bugs in
production kernels: in general I don't know of any real precedent in FreeBSD
for things like that (aside from explicit calls to panic() which usually arise
when there's no other sensible action to take), and this is a somewhat
obscure case. There is only one place where we acquire multiple range locks
from a single thread.

If this bug was actually arising in practice, I would feel differently. So
far I found one bug where the assertion is violated, but it would not be
caught by the print because it is a trylock.

This revision now requires review to proceed.Wed, Mar 26, 12:28 AM

Harbormaster completed remote builds in B63128: Diff 152680.Wed, Mar 26, 12:28 AM

This revision was not accepted when it landed; it landed in state Needs Review.Mon, Mar 31, 9:06 AM

Closed by commit rGfe7fe3b175b6: rangelock: Fix handling of trylocks (authored by markj). · Explain Why

This revision was automatically updated to reflect the committed changes.

markj added a commit: rGfe7fe3b175b6: rangelock: Fix handling of trylocks.

rangelock: Fix handling of trylocks
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 152890

sys/kern/kern_rangelock.c

rangelock: Fix handling of trylocksClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 152890

sys/kern/kern_rangelock.c

rangelock: Fix handling of trylocks
ClosedPublic
Actions

Revision Contents
Changeset List