Page MenuHomeFreeBSD

reduce overhead of entropy collection
ClosedPublic

Authored by mmacy on May 23 2018, 12:45 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Mar 22, 4:31 PM
Unknown Object (File)
Feb 13 2024, 12:23 AM
Unknown Object (File)
Jan 3 2024, 7:29 AM
Unknown Object (File)
Dec 20 2023, 4:24 AM
Unknown Object (File)
Dec 4 2023, 12:09 AM
Unknown Object (File)
Nov 12 2023, 5:47 AM
Unknown Object (File)
Nov 7 2023, 11:23 PM
Unknown Object (File)
Nov 7 2023, 6:04 PM

Details

Summary
  • move mask to hot primarily read only cache line
  • check mask first to avoid overhead of a function call
  • don't collect entropy on ethernet by default

This is rx of ~2.9Mpps on a 2x8x2 SKL
https://people.freebsd.org/~mmacy/2018.05.11/udprx1.svg

This is rx of ~4Mpps
https://people.freebsd.org/~mmacy/2018.05.11/udprx_norandom.svg

We're suffering major performance degradation doing this.

However, there's more to unpack here:

random_harvest_queue(m, sizeof(*m), 2, RANDOM_NET_ETHER);

We're sampling harvest size (8 bytes) from the start of the mbuf. So we're literally only ever getting the m_next pointer (the first field of the mbuf). Assuming this were actually set this would be a poor source of entropy. There are only a few thousands of mbufs allocated at low activity. However, unless you're running with jumbo frames, m_next will _always_ be null on ether_input. And, anyone running in production using jumbo frames will have turned this off. So, in the entire time since this change went in we've never gotten any entropy from this source.

Correction: it will be non-NULL on LRO packets. Nonetheless, the point holds.

More generally we need to see a lot more rigor in the justifications surrounding random's design, entropy collection needs, and entropy collection points.

  • How much entropy does random actually need?
  • How much is it actually getting from a given source?
  • Why aren't we using hardware entropy sources? When can we?

There are statistical tests to qualify these things. I think it's quite reasonable that we require a more substantive justification for the overhead than any that have been given to date. We're allowing very poor optics for FreeBSD on the basis of what appears to at best be conjecture.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 16938

Event Timeline

sys/dev/random/random_harvestq.c
490–494

KASSERT references removed parameter

sys/sys/random.h
119

I don't think the predict_false's make sense here.

158

comment mismatch

We only need to generate about 8 bytes of entropy per second total for the spiking of the algorithms...

Pending trivial cleanups pointed out by @cem , approved.

This revision is now accepted and ready to land.May 23 2018, 11:15 PM

Update with cem's comments
Will commit 5/31 barring further comments.

This revision now requires review to proceed.May 30 2018, 9:13 PM
sys/sys/random.h
119

why do you think it does not make sense? it helps the compiler *avoid* a forward jump in the common case of the feature not being enabled - instead it will be more likely to insert a jmp to a section which calls random_harvest_queue_ and then jumps back. a case which arguably is already heavily pessimized. so i would argue it makes excellent sense.

This revision is now accepted and ready to land.May 30 2018, 9:19 PM

If people want to use ethernet entropy harvesting, can we do something to make it more useful in a separate change? I was going to initially suggest the ether dst addr, but that's not very random either.

In any cases, I think these changes are a huge win. Especially moving the collection mask outside a cacheline that is written in the hot path.

LGTM modulo nit about only disabling on amd64.

sys/sys/random.h
102–105

This special casing is a little dubious — e.g., PPC64 has TRNG since POWER7 and is more than capable of driving the same high-throughput NICs as amd64.

I'm leaning towards just disable it for all arch. *Maybe* add a runtime test to detect other sources of entropy and enable it if there's nothing else. Then again, the data is not very random (or not at all random for non-jumbo frames? Not sure we ever determined that), so maybe it's useless there too.

119

Why do you think that features being disabled is the common case? E.g., I would expect pure sources to be both enabled and sampled reasonably often. The __predict_false will mispredict those.

Additionally, CPU branch predictors have been smart enough to predict the branch correctly for disabled sources since the 90s. I think predict_false/true should only be used when we're actually going to be smarter than the runtime branch predictor.

  • disable ethernet entropy collection by default
This revision now requires review to proceed.May 30 2018, 10:55 PM

Given that there is trivially little if any entropy coming from mbufs is there a reason we're leaving this callsite at all? has anyone from secteam commented?

In D15526#330095, @jeff wrote:

Given that there is trivially little if any entropy coming from mbufs is there a reason we're leaving this callsite at all? has anyone from secteam commented?

That might be the right next step, or maybe it can be improved to be useful. I don't think we need to go that far in the first change.

sys/sys/random.h
159

comment mismatch cropped up again

delphij requested changes to this revision.May 31 2018, 2:08 AM
delphij added a subscriber: delphij.

The change LGTM in principal but I'd like to request a few minor (cosmetic) change.

  1. Please add RANDOM_ENABLE_ETHER to sys/conf/NOTES; this makes sure we have build code either way in tinderbox builds.
  1. Please fix the RANDOM_ENABLE_ETHER / RANDOM_ENABLE_UMA mismatch in sys/sys/random.h, as pointed out by @cem .

Optionally, if I was you I'd probably define random_harvest_queue_ether differently by omitting the RANDOM_NET_ETHER parameter to the macro (since the the macro's name already give a clear hint that it's for ethernet entropy, and the sole purpose of its existence is to provide a way to fully bypass the collection code).

This revision now requires changes to proceed.May 31 2018, 2:08 AM

Accept as secteam@ -- talked with so@ and we have no objection to the change in principal.

Further clarify the intent of RANDOM_ENABLE_ETHER

This revision is now accepted and ready to land.May 31 2018, 4:25 AM
This revision was automatically updated to reflect the committed changes.

If people want to use ethernet entropy harvesting, can we do something to make it more useful in a separate change? I was going to initially suggest the ether dst addr, but that's not very random either.

In any cases, I think these changes are a huge win. Especially moving the collection mask outside a cacheline that is written in the hot path.

as I have suggested, doing a sampling is a good choice for those that want/need it.. i.e. use 10 packets/second (or less). Use the previous second's pps to calculate the n for selecting every nth packet to use such that it comes out to something like 10pps. Then you use the entire packet, not just the header w/o packet data. Gives us good entropy, makes minimal impact on system overhead, and a huge improvement.

In D15526#330700, @jmg wrote:

as I have suggested, doing a sampling is a good choice for those that want/need it.. i.e. use 10 packets/second (or less). Use the previous second's pps to calculate the n for selecting every nth packet to use such that it comes out to something like 10pps.

Sure, this seems plausible. The existing pps primitive we have probably doesn't work well on many-core systems — I think it would ping pong around with many RX queues — but one can easily imagine enhancing it to be suitable for this goal.

Then you use the entire packet, not just the header w/o packet data.

random_harvest_queue() truncates to the first 8 bytes of input due to constraint on the size of a hc_entropy_ring entry (harvest_event::he_entropy[HARVESTSIZE]). We would have to change that behavior (with potentially wide-reaching implications), or add a new primitive that does not truncate, or pre-mix packet bytes together (simple xor probably fine).

And after all that work, I'm still not sure packet contents are an especially good source of random data. It's harmless to mix in crappy random data after Fortuna is seeded, but spamming the entropy pool with crappy entropy pre-seed may unblock random before it is truly unpredictable, since our unblock heuristic is basically just a byte count of seed data.

While certainly better than what we were doing, global pps accounting would not be efficient multi-core. I'd much rather have it collect the first 10 packets or so every second.

While certainly better than what we were doing, global pps accounting would not be efficient multi-core. I'd much rather have it collect the first 10 packets or so every second.

That is an implementation detail that I did not specify. If you have a 8 queue nic, maybe pick 1 or 2 packets per queue, or something.. or look at traffic per nic and balance it between each, etc. There are MANY solutions that will get you close to 10pps (or maybe even less) that are perfectly workable for efficient multi-core machines.

Collecting the first 10 packets or so every second is not a great idea. With things like tcp off-load which bursts packets, it's common for bursts of packets to be correlated.

Collecting the first 10 packets or so every second is not a great idea. With things like tcp off-load which bursts packets, it's common for bursts of packets to be correlated.

Yes. True. Nonetheless, I'm much more interested in the questions I raised in the review. All of these implementation details are secondary to those.

Collecting the first 10 packets or so every second is not a great idea. With things like tcp off-load which bursts packets, it's common for bursts of packets to be correlated.

Yes. True. Nonetheless, I'm much more interested in the questions I raised in the review. All of these implementation details are secondary to those.

I've written answers to most of those, or at least attempted to discuss it before. But short answers are:

  • How much entropy does random actually need?

"None". Once a 256-bit seed has been generated at install time and no further entropy is needed. I'm fine w/ small amounts of additional entropy to prevent evil maid compromises (e.g. machine is powered off, maid clones hard drive, next boot, rng is "predictable"). Reference: https://blog.cr.yp.to/20140205-entropy.html

  • How much is it actually getting from a given source?

Due to above, it doesn't/shouldn't mater. Depending upon kernel sources other than the keyboard for generating initial seed is likely to be very broken.

  • Why aren't we using hardware entropy sources? When can we?

We are in a few cases. But there is also the case where people confuses PRNGs with TRNGs. Only the later are useful. RDRAND is now being used, I am working on setting up a lab (for everyone to access) that has various hardware so that drivers can be more easily be written and tested (no ETA though).

Oh, note that the above are my opinions, and they differ significantly from most of the rest of the FreeBSD team. Mainly why I haven't spent much time working on the rng system,

In D15526#330724, @jmg wrote:

Oh, note that the above are my opinions, and they differ significantly from most of the rest of the FreeBSD team. Mainly why I haven't spent much time working on the rng system,

That mostly looks reasonable to me — the big caveat is around initial seeding. Especially on embedded systems with readonly media, but also "evil maid" scenarios, as you point out. I'm not part of secteam, though.