random/ivy: Provide mechanism to read independent seed values from rdrand
ClosedPublic
Actions

Authored by cem on Nov 20 2019, 2:40 AM.

Details

Reviewers

delphij
markm
jmg

Group Reviewers

csprng
O3: Kernel Random Numbers Generator	(Owns No Changed Paths)

Commits

rS355014: random/ivy: Provide mechanism to read independent seed values from rdrand

Summary

On x86 platforms with the intrinsic, rdrand is a deterministic bit generator
(AES-CTR) seeded from an entropic source. On x86 platforms with rdseed, it
is something closer to the upstream entropic source. (There is more nuance;
a block diagram is provided in [1].)

On devices with rdrand and without rdseed, there is no good intrinsic for
acecssing the good entropic soure directly. However, the DRBG is guaranteed
to reseed every 8 kB on these platforms. As a conservative option, on such
hardware we can just read an extra 7.99kB samples every time we want a
sample from an independent seed.

Because there is some performance penalty to this more conservative option,
a knob is provided to disable (and enable) the change. The change does not
affect platforms with RDSEED.

[1]: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-4-2

Test Plan

rdrand microbench here, if someone wants to replicate or check my work: https://reviews.freebsd.org/P336

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 27684
Build 25884: arc lint + arc unit

Event Timeline

cem created this revision.Nov 20 2019, 2:40 AM

Harbormaster completed remote builds in B27655: Diff 64604.Nov 20 2019, 2:40 AM

cem added a parent revision: D22454: random/ivy: Trivial refactoring.Nov 20 2019, 2:41 AM

cem added a reviewer: csprng.

cem added inline comments.Nov 20 2019, 2:48 AM

sys/dev/random/ivy.c
65	I am open to defaulting to the opposite (status quo). I don't know how bad the performance penalty will be. It seems like the worst case might be something like 114 microseconds additional per sample.

No objection to the logic, just the user-facing wording needs to be "de-geeked" a bit! :-)

sys/dev/random/ivy.c
65	A few microbenchmarks will help here.
69	This string may be confusing to the reader. It will either need to be clarified in a man page, or get a bit more explicit about the option turning on potentially expensive reseed delays.
84	OUCH :-)

In D22455#491164, @markm wrote:

No objection to the logic, just the user-facing wording needs to be "de-geeked" a bit! :-)

Sure! I struggled to find good wording for this.

sys/dev/random/ivy.c
65	I get about 25 MB/s out of RDRAND on AMD Zen single thread. 8kiB works out to about 300 microseconds per sample here. (But this machine supports RDSEED, so it wouldn't need this workaround either.) The other option from §5.2.5 is: Iteratively execute 32 RDRAND invocations with a 10 us wait period per iteration. That would have less CPU overhead, but similar or worse latency, depending on platform RDRAND speed. I'll plan to default it off, status quo, until we can better explain why it might be preferable and better understand the performance impact to real systems without rdseed.
69	I agree. I'm not sure how to bridge the gap, though.
84	Yes, adding `RDSEED` was good (for the concerns that this change would reflect).

cem edited the test plan for this revision. (Show Details)Nov 20 2019, 8:41 PM

markm added inline comments.Nov 20 2019, 9:28 PM

sys/dev/random/ivy.c
69	Maybe something like "If non-zero, use more expensive and slow, but safer, seeded samples where RDSEED is not present"?

Update default and sysctl language.

Harbormaster completed remote builds in B27684: Diff 64680.Nov 21 2019, 5:53 PM

cem added inline comments.Nov 21 2019, 5:53 PM

sys/dev/random/ivy.c
65	We try to harvest 4-32 bytes from each pure entropy source per pool. At 32 pools, that's 128-1024 bytes. That's 5-40 milliseconds of CPU time per `random_sources_feed()` on my platform, if I'm doing the math right. That's probably too expensive to default "on." For comparison, on a Haswell-era Intel I have lying around (which actually lacks RDSEED), RDRAND achieves more like 173 MB/s, reducing these costs somewhat: 43 us per sample, 700 us - 6ms per feed. Still too high. I suppose we could also implement the sleep based approach, although latency might still be a problem. One other amusing approach might be to take advantage of the independent generators on each CPU in SMP systems: you could pull 8 bytes per core, then run all of the 32x10us sleeps in parallel, then pull another 8 bytes per core. I think this approach is way too complicated and not worth it.