Page MenuHomeFreeBSD

random/ivy: Provide mechanism to read independent seed values from rdrand
ClosedPublic

Authored by cem on Wed, Nov 20, 2:40 AM.

Details

Summary

On x86 platforms with the intrinsic, rdrand is a deterministic bit generator
(AES-CTR) seeded from an entropic source. On x86 platforms with rdseed, it
is something closer to the upstream entropic source. (There is more nuance;
a block diagram is provided in [1].)

On devices with rdrand and without rdseed, there is no good intrinsic for
acecssing the good entropic soure directly. However, the DRBG is guaranteed
to reseed every 8 kB on these platforms. As a conservative option, on such
hardware we can just read an extra 7.99kB samples every time we want a
sample from an independent seed.

Because there is some performance penalty to this more conservative option,
a knob is provided to disable (and enable) the change. The change does not
affect platforms with RDSEED.

[1]: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-4-2

Test Plan

rdrand microbench here, if someone wants to replicate or check my work: https://reviews.freebsd.org/P336

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem created this revision.Wed, Nov 20, 2:40 AM
cem added inline comments.Wed, Nov 20, 2:48 AM
sys/dev/random/ivy.c
65 ↗(On Diff #64604)

I am open to defaulting to the opposite (status quo). I don't know how bad the performance penalty will be. It seems like the worst case might be something like 114 microseconds additional per sample.

markm added a subscriber: markm.Wed, Nov 20, 8:36 AM

No objection to the logic, just the user-facing wording needs to be "de-geeked" a bit! :-)

sys/dev/random/ivy.c
65 ↗(On Diff #64604)

A few microbenchmarks will help here.

69 ↗(On Diff #64604)

This string may be confusing to the reader. It will either need to be clarified in a man page, or get a bit more explicit about the option turning on potentially expensive reseed delays.

84 ↗(On Diff #64604)

*OUCH* :-)

cem added a comment.Wed, Nov 20, 8:39 PM

No objection to the logic, just the user-facing wording needs to be "de-geeked" a bit! :-)

Sure! I struggled to find good wording for this.

sys/dev/random/ivy.c
65 ↗(On Diff #64604)

I get about 25 MB/s out of RDRAND on AMD Zen single thread. 8kiB works out to about 300 microseconds per sample here. (But this machine supports RDSEED, so it wouldn't need this workaround either.)

The other option from §5.2.5 is:

Iteratively execute 32 RDRAND invocations with a 10 us wait period per iteration.

That would have less CPU overhead, but similar or worse latency, depending on platform RDRAND speed.

I'll plan to default it off, status quo, until we can better explain why it might be preferable and better understand the performance impact to real systems without rdseed.

69 ↗(On Diff #64604)

I agree. I'm not sure how to bridge the gap, though.

84 ↗(On Diff #64604)

Yes, adding RDSEED was good (for the concerns that this change would reflect).

cem edited the test plan for this revision. (Show Details)Wed, Nov 20, 8:41 PM
markm added inline comments.Wed, Nov 20, 9:28 PM
sys/dev/random/ivy.c
69 ↗(On Diff #64604)

Maybe something like "If non-zero, use more expensive and slow, but safer, seeded samples where RDSEED is not present"?

cem updated this revision to Diff 64680.Thu, Nov 21, 5:53 PM
cem marked 2 inline comments as done.

Update default and sysctl language.

cem added inline comments.Thu, Nov 21, 5:53 PM
sys/dev/random/ivy.c
65 ↗(On Diff #64604)

We try to harvest 4-32 bytes from each pure entropy source per pool. At 32 pools, that's 128-1024 bytes. That's 5-40 milliseconds of CPU time per random_sources_feed() on my platform, if I'm doing the math right. That's probably too expensive to default "on."

For comparison, on a Haswell-era Intel I have lying around (which actually lacks RDSEED), RDRAND achieves more like 173 MB/s, reducing these costs somewhat: 43 us per sample, 700 us - 6ms per feed. Still too high. I suppose we could also implement the sleep based approach, although latency might still be a problem.

One other amusing approach might be to take advantage of the independent generators on each CPU in SMP systems: you could pull 8 bytes per core, then run all of the 32x10us sleeps in parallel, then pull another 8 bytes per core. I think this approach is way too complicated and not worth it.

delphij accepted this revision.Thu, Nov 21, 11:40 PM
This revision is now accepted and ready to land.Thu, Nov 21, 11:40 PM
markm accepted this revision.Fri, Nov 22, 8:03 AM