Page MenuHomeFreeBSD

random(4): More thoroughly attempt to ensure seeding during priming
Needs RevisionPublic

Authored by cem on Apr 16 2019, 5:39 PM.

Details

Summary

random_harvestq_prime() loads any saved early entropy, if available, but
this may not always be available on e.g., embedded systems, installers,
or systems that do not use loader(8) and cannot access the /boot filesystem
during SI_SUB_RANDOM.

Fast random sources, like RDRAND on x86 and DARN on Power are loaded as
modules earlier in mi_startup (SI_SUB_KLD). If we did not find enough saved
early entropy to unblock the random device but we do happen to have fast
random sources available, use them to provide sufficient early seeding.

If no entropy and no such sources are available, we now produce an explicit
warning to that effect.

Test Plan

Tested in Bhyve by dropping to single user and deleting /boot/entropy.

Loading kernel...
/boot/test.GENERIC/kernel text=0x167d6a4 data=0x1fff04+0x61b9c4 syms=[0x8+0x1a80f8+0x8+0x169696]
Loading configured modules...
can't find '/boot/entropy'
...
  Features2=0xfed83203<...,RDRAND,...>
...
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
...
( no warning from __stack_chk_init )

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 23704
Build 22673: arc lint + arc unit

Event Timeline

cem created this revision.Apr 16 2019, 5:39 PM
cem updated this revision to Diff 56262.Apr 16 2019, 6:37 PM

Shuffle ordering of pure random source registration from late in autoconfig
(SI_SUB_DRIVERS) up to SI_SUB_RANDOM, and correctly invoke ra_pre_read in order
to actually transition an algorithm from !seeded to seeded (otherwise, all the
entropy in the world will not change that status).

Tested in Bhyve by dropping to single user and deleting /boot/entropy.

Loading kernel...
/boot/test.GENERIC/kernel text=0x167d6a4 data=0x1fff04+0x61b9c4 syms=[0x8+0x1a80f8+0x8+0x169696]
Loading configured modules...
can't find '/boot/entropy'
...
  Features2=0xfed83203<...,RDRAND,...>
...
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
...
( no warning from __stack_chk_init )
cem edited the test plan for this revision. (Show Details)
delphij requested changes to this revision.Apr 16 2019, 8:37 PM
delphij added inline comments.
sys/dev/random/random_harvestq.c
502

I think we should also issue a warning that the entropy device is seeded purely with fast sources here too (if we never had any cached entropy available before reaching this do block), and the fast sources are not necessarily trustworthy.

arc4rand() is called fairly early and that would make Chacha20 seeded with whatever we have derived from the pool at the time for ~5 minutes or 64KB (whichever comes first).

This revision now requires changes to proceed.Apr 16 2019, 8:37 PM
cem added a comment.Apr 16 2019, 9:06 PM

Thanks for taking a look!

sys/dev/random/random_harvestq.c
502

I'll add the warning about fast-only source seeding. My only concerns are: it might not be especially meaningful considering we should incorporate additional entropy by the time an administrator can actually do anything about it; and it may be misleading, considering most people likely do not have good intuitions about forward secrecy and how Fortuna achieves it.

I don't understand the ~5 minutes or 64 kB comment, can you elaborate?

imp requested changes to this revision.Apr 17 2019, 5:38 PM

I like that we can have a fallback seeding to RND devices, but Netflix requires a system to be bootable to userland no matter what goes on with the randomness in early boot. As such, we'd like a mode that can report that something screwed up in the early random stuff so we can decide if we care or not about that. And then take appropriate action based on our level of caring and concern. We have boxes all over the world and a hung boot would mean an RMA which is quite costly. We'd rather the system come up at least enough to report broken randomness to our backplane, or to take remediation steps when, for example, our entropy file is corrupted on a crash by recreating it and rebooting.

sys/dev/random/random_harvestq.c
490

Can we make this warning optional? Our application simply doesn't care.
We'd prefer to see this also noted as a sysctl that we returned randomness that wasn't fully seeded and let early programs that run in /etc/rc decide what to do about that. This will help the 'broken entropy file' issue as well as issues surrounding whether or not we care about one scenario vs another given our deployment model. Some uses of crappy random are bad in some situations. Sometimes it's OK. We'd like to be in control of that by having a knob that we can turn to 'boot no matter what and don't bother with warnings' and then rely on the sysctl reported state of the early randomness so we can make an automated response. with 10k machines in the field, we can't drop to single user, or have machines just fail to boot that are located literally in the middle of the amazon rain forrest with no remote hands to resolve the situation.

502

I'd prefer a way to disable the warning (our application simply doesn't care, and new warnings produce lots of issues internally). I'd like this to be reported to userland via a sysctl so that our automation can decide what to do: accept the results, or take counter measures and reboot. I don't want the kernel making this choice in an environment where booting and not blocking the boot is way more important than the quality of the randomness that we booted under.

cem added a comment.Apr 17 2019, 6:16 PM
In D19928#428534, @imp wrote:

I like that we can have a fallback seeding to RND devices, but Netflix requires a system to be bootable to userland no matter what goes on with the randomness in early boot. As such, we'd like a mode that can report that something screwed up in the early random stuff so we can decide if we care or not about that.

Yes. I discussed this publicly in the thread with John yesterday and agreed to add such a knob. You will definitely be looped in. I plan to add that separately from this change, which only ensures initial early seeding on devices with fast HW random sources available.

I'm also on board with programmatic exposure to tooling of how bad our initial seeding was so your control plane can choose to take action on it, or not. I don't think it's any worse an information leak than, e.g., the warnings we're adding.

sys/dev/random/random_harvestq.c
490

Does the warning hurt if your application doesn't care?

I plan to add the don't-care knob in a separate differential and you will certainly be CC'd.

502

Sure, a sysctl report would be reasonable.

delphij added inline comments.Apr 17 2019, 11:29 PM
sys/dev/random/random_harvestq.c
502

The ~5 minutes or 64KiB was for Chacha20 reseed interval.

But thinking more through, I think the most worrying part was that we could end up treating the device as seeded when it is only seeded with entropy from "fast" sources, which would allow these sources to leave the system with weak entropy for some time, until the system boots up and collected enough additional entropy from somewhere else?

cem added inline comments.Apr 18 2019, 12:01 AM
sys/dev/random/random_harvestq.c
502

Two of the nice properties of Fortuna are forward secrecy and eventual recovery from totally compromised state, as long as new entropy enters the system. Your "weak entropy due to fast random sources" concern seems like a light version of the compromised state property (zero entropy). Fortuna does recover from this, eventually.

So, I'm not sure that's especially worrying. I'm more worried we'll end up seeding arc4rand and/or stack cookies with get_cyclecount() due to concerns raised by many about availability; both of which are almost certainly worse than even the worst fast random source. A CPU attacker who can backdoor their HWRNG might as well backdoor any other component of the CPU.

cem added inline comments.Apr 20 2019, 12:21 AM
sys/dev/random/random_harvestq.c
490

Warner, I'll plan to mask the warning if the random_bypass_disable_warnings knob is set. It's not an exact match for this print, but I think the intent for the knob more or less lines up with intent to hide these warnings. Does that sound ok?

imp added a comment.EditedApr 20 2019, 2:51 AM

(sorry, last comment was stale)

sys/dev/random/random_harvestq.c
490

works for us. thanks!

delphij added inline comments.Apr 21 2019, 7:08 AM
sys/dev/random/random_harvestq.c
502

Yes, I agree that Fortuna would eventually recover from this once additional entropy is fed into it.

It is also beyond our threat model (exactly to your point: a CPU attacker can do anything already) that a CPU is deliberately hacked to allow certain attacks.

I think the main concern here is that we rely on only one source when calling the entropy device seeded, if the HWRNG generation was flawed. At early boot, we would have the entropy device consider itself as seeded, but practically we might have it seeded in only a few limited ways. For seeding stack protector, it would be good enough but for other purposes it could be a bad choice.

cem added inline comments.Apr 21 2019, 6:20 PM
sys/dev/random/random_harvestq.c
502

I'm having trouble seeing even (Fortuna seeded with) flawed HWRNG as a worse choice than cyclecount or similar which might be used as a fallback.

I think it would be fine to provide users a paranoia knob to skip fast HWRNG initial seeding, but I assert that the knob should default to off. I.e., we trust our current set of HWRNGs enough to provide something like 2-4 "bits/byte of entropy," which is what we need for full strength seeding.

cem added a comment.May 7 2019, 10:04 PM

Now that some of the dust has settled around random, I'd like to work on getting this in.

@delphij, would an acceptable solution for the concern that fast HWRNG might be flawed be to leave a knob for more paranoid users? Setting the knob would skip this force-fast-seeding-with-single-hwrng-source. I think a reasonable default would be for the knob to be off, i.e., seed with fast random source if one is available and /boot/entropy was insufficient to unblock fortuna.

Other folks, does that seem like a reasonable compromise and default for the knob? Thanks!

markm accepted this revision.May 12 2019, 9:10 AM

Looks OK to me!