Add witness(4) warnings around potentially blocking requests for random
AbandonedPublic
Actions

Authored by cem on Apr 18 2019, 12:21 AM.

Details

Reviewers

jhb
emaste
imp
delphij

Summary

Uses a td_pflag to allow pre-checking outside a locked region without
triggering a false positive in the locked region, since seeded state is a
one-time transition.

Suggested by: John Baldwin

Test Plan

Is something like this what you had in mind? Do you object to the thread pflag
approach to limiting false positives?

x86 VM victims:

Apr 17 17:24:41 testvm kernel: #1 0xffffffff80c15265 at witness_warn+0x285
Apr 17 17:24:41 testvm kernel: #2 0xffffffff80c93371 at arc4rand+0x41
Apr 17 17:24:41 testvm kernel: #3 0xffffffff80c93538 at arc4random+0x18
Apr 17 17:24:41 testvm kernel: #4 0xffffffff80d3dc4d at in_pcb_lport+0x2bd
Apr 17 17:24:41 testvm kernel: #5 0xffffffff80e0ed2e at in6_pcbsetport+0x9e
...
(Many other stacks that arrive in in_pcb_lport)
...
Apr 17 17:24:41 testvm kernel: #3 0xffffffff80c93538 at arc4random+0x18
Apr 17 17:24:41 testvm kernel: #4 0xffffffff80e108b5 at initid+0x25
Apr 17 17:24:41 testvm kernel: #5 0xffffffff80e10b68 at randomid+0xd8
Apr 17 17:24:41 testvm kernel: #6 0xffffffff80e10bc0 at ip6_randomflowlabel+0x10
...
(and many duplicates)
...

I didn't see any other callers (but WITNESS_WARN is very spammy).

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 23729
Build 22690: arc lint + arc unit

Event Timeline

cem created this revision.Apr 18 2019, 12:21 AM

Harbormaster completed remote builds in B23728: Diff 56321.Apr 18 2019, 12:21 AM

Yeah, moving the check into arc4rand helps.

Harbormaster completed remote builds in B23729: Diff 56322.Apr 18 2019, 12:26 AM

cem edited the test plan for this revision. (Show Details)Apr 18 2019, 12:30 AM

cem added a parent revision: D19944: random(4): Restore availability tradeoff prior to r346250.

cem added reviewers: emaste, imp, delphij.Apr 18 2019, 12:32 AM

cem added a child revision: D19972: in_pcb_lport: Don't attempt to randomize if random(4) is unavailable.Apr 19 2019, 7:32 PM

cem added a child revision: D19973: ip6_id: Avoid blocking if random(4) is not available.Apr 19 2019, 7:57 PM

I'm not sure a per-thread flag is quite right. For example, a crypto driver might check the flag and reject attempts to create sessions in its crypto_newsession callback, but later uses of arc4random to generate IV's will occur in different threads (e.g. from a TCP timer that transmits a packet via IPsec).

Let me make sure I understand the concern correctly.

In D19948#429342, @jhb wrote:

I'm not sure a per-thread flag is quite right. For example, a crypto driver might check the flag

As in, is_random_seeded(), or specifically, TDP_RNG_SEEDED? The latter would be a programming error.

and reject attempts to create sessions in its crypto_newsession callback, but later uses of arc4random to generate IV's will occur in different threads (e.g. from a TCP timer that transmits a packet via IPsec).

I'm not sure I understand the problem. If a TCP timer invokes arc4random in a non-sleepable context, that's a valid WITNESS_WARN unless we know random is seeded. It is certainly possible that the thread that scheduled the timer knew random was seeded and therefore the warning would be a false positive.

Basically, the goal of the per-thread flag is to eliminate some class of false positives. But it does not eliminate all false positives. Is that the concern? Or are you describing a false negative scenario? If so, I don't follow.

I don't see any great solution for these kinds of false positives. We could programmatically forward seeded state from seeded threads to any callout wheels they register on or TASKs they invoke. This gets pretty messy, but might be needed given how spammy these reports are.

cem added a child revision: D20011: ip6_randomflowlabel: Avoid blocking if random(4) is not available.Apr 22 2019, 4:19 PM

cem mentioned this in D20172: tcp_syncache: Mute a false positive random-may-block warning.May 6 2019, 6:12 PM

cem added a child revision: D20172: tcp_syncache: Mute a false positive random-may-block warning.May 6 2019, 6:14 PM

In D19948#429398, @cem wrote:

Let me make sure I understand the concern correctly.

In D19948#429342, @jhb wrote:

I'm not sure a per-thread flag is quite right. For example, a crypto driver might check the flag

As in, is_random_seeded(), or specifically, TDP_RNG_SEEDED? The latter would be a programming error.

The former.

and reject attempts to create sessions in its crypto_newsession callback, but later uses of arc4random to generate IV's will occur in different threads (e.g. from a TCP timer that transmits a packet via IPsec).

I'm not sure I understand the problem. If a TCP timer invokes arc4random in a non-sleepable context, that's a valid WITNESS_WARN unless we know random is seeded. It is certainly possible that the thread that scheduled the timer knew random was seeded and therefore the warning would be a false positive.

Correct, in this case the crypto driver would know to not create the session unless it was seeded, so its associated code that invokes arc4random will only be run if the PRNG is seeded, but the check and the arc4random use are in different threads, so TDP_RNG_SEEDED doesn't help.

Basically, the goal of the per-thread flag is to eliminate some class of false positives. But it does not eliminate all false positives. Is that the concern? Or are you describing a false negative scenario? If so, I don't follow.

I don't see any great solution for these kinds of false positives. We could programmatically forward seeded state from seeded threads to any callout wheels they register on or TASKs they invoke. This gets pretty messy, but might be needed given how spammy these reports are.

The separate API I described earlier allows the programmer to manage these cases explicitly by calling the non-blocking variant, probably with an assertion that they never fail.

In D19948#434568, @jhb wrote:

Correct, in this case the crypto driver would know to not create the session unless it was seeded, so its associated code that invokes arc4random will only be run if the PRNG is seeded, but the check and the arc4random use are in different threads, so TDP_RNG_SEEDED doesn't help.

Right:

Basically, the goal of the per-thread flag is to eliminate some class of false positives. But it does not eliminate all false positives.

So D20172 is basically an example of the same problem — false positive because another thread accesses the state after a different thread performed the initial check.

One solution is to assert is_random_seeded in those cases, like D20172. That will set the bit on the current thread as a side effect, and mask the false positive warning.

The separate API I described earlier allows the programmer to manage these cases explicitly by calling the non-blocking variant, probably with an assertion that they never fail.

I don't follow. They can already assert non-blocking behavior without a new API, like D20172.