Replace the kernel RC4 algorithm with Chacha20.
Keep the API, though, as that is what the other *BSD's have done.
Use boot-time entropy cache to bootstrap the generator at
first (re)seed.
Differential D10048
Replace the kernel RC4 with Chacha20. markm on Mar 18 2017, 6:55 PM. Authored by Tags None Referenced Files
Details
Replace the kernel RC4 algorithm with Chacha20. Use boot-time entropy cache to bootstrap the generator at Lots of printf's in the code, not visible here.
Diff Detail
Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older Changes Comment Actions Reply to reviewer.
Comment Actions LGTM overall but please consider using explicit_bzero when removing sensitive data from memory. BTW. do you think it's worth to implement a wrapper around arc4rand() to emulate arc4random_buf? (no action requested here)
Comment Actions This will cause issues on platforms that do not use loader. We do not require loader on all of our platforms, and those that don't will have issues w/ the way chacha is started. As there is not an error (continues), this creates divergent behavior. We should ensure that chacha is seeded from the random infrastructure so that people who work to ensure that it is properly seeded will gain the benefits instead of creating this hidden and undocumented dependency on /boot/entropy. Also, /boot/entropy is not documented in random(4). Comment Actions Incorrect. If the preload_search_by_type() cannot find a preloaded chunk (perhaps because there was no loader to load one), then it will simply return NULL, and no harm is done. This fall back to the way arc4random has been working until now.
Its not a dependency; it is an attempt to improve on the /status quo/ in a safe, backwards compatible way. The current situation is to grasp at straws; this fix gives it a chance IF there is a stash of random gunk given. If there is none, no change (apart from the RNG algorithm). "Best effort" is currently being done, this is an improvement.
True, thanks. I will fix this in a separate commit. Comment Actions I can do a pkg exp-run with this patch on HardenedBSD's infrastructure tomorrow if desired. Comment Actions markm, I've pointed out where the issue is.
Comment Actions Could you please take a look at the arc4rand() portion of my comment?
Comment Actions Reply to reviewers.
Comment Actions Hi, Could you please also address https://reviews.freebsd.org/D10048?id=26438#inline-59236 ? Could you also make the existing type ('/boot/entropy') the fallback type when boot_entropy_cache do not exist?
Comment Actions Respond to reviewer.
Comment Actions I'm not comfortable with critical sections for now. Back to mutexes Comment Actions Ok let's take that at a later time. Could you please fix arc4rand() so that the while() loop in line 179 will restir as needed though? Comment Actions Just a few quick comments:
Comment Actions pkg exp-run completed successfully with an earlier version of the patch on HardenedBSD. It did take longer to perform the pkg build: 40 hours versus an average 32 hours. Comment Actions I can fix that!
I'm concerned about cpu migration. Mutexes don't guarantee that a thread will stay on the same cpu, right?
Sounds nifty! Where can I read about this? Comment Actions This is correct: you must make sure that you continue to access state on the CPU for which you acquired a mutex -- e.g., by caching a pointer to the per-CPU state you are accessing, in case migration takes place.
I'm not sure it has a man page, unfortunately -- and it should do. But take a look at use of the DPCPU* macros in src/sys/kern. E.g., in kern_switch.c for per-CPU schedule statistics. Comment Actions But that is racey. Preemption can in theory occur straight after I have verified that it hasn't. Looks like I need to use critical regions for now. I can live with that if you can?
I'm bailing out of this, sorry. The original intent of my commit is in danger of being swamped by side-issues. They may be important, but I feel that is a reason to return, not encumber this job further. Comment Actions
If using mutexes to protect access, then there is no race -- preemption and migration while holding the mutex is fine, as other attempts to access the same data will wait until the first thread is done, regardless of migration. This would make CPU indexing of the data structure a strong affinity rather than a strict requirement, while allowing migration to take place if the scheduler feels it is important (i.e., if preemption is preferable to continuing to run on the current CPU, and migration is better than waiting for the CPU to become available again).
Your commit introduces per-CPU memory allocation. My comment is that if you're going to do that, you should use the proper infrastructure we provide for the purpose, which was designed to solve the same problem that you are trying to solve, but does it better. Comment Actions Here, was my reading:
I.e., you allocate memory intended to be used per-CPU. DPCPU is a facility to support that more efficiently that avoids NUMA problems, etc. I'm currently working on a man page since, embarrassingly, there isn't one. But it is quite easy to use, especially when usage is limited to a single .c file. Comment Actions RWatson: Not to get picky or anything, but there was already a malloc() in that place.
Comment Actions I've misread the patch; I'm happy for this to be fixed in a separate commit. I'll continue to sort out a DPCPU man page, however. Comment Actions FYI, I have now committed a man page for DPCPU(9) in r316003. It includes some (safe?) synchronisation patterns in its example code. Comment Actions Ping? :) If this needs additional time, would you mind if I integrate part of your change (arc4random_buf) for now? Comment Actions Please allow me some time to commit my Chacha20 implementation first so we can use that instead of the legally dubious version which is included in this patch. I hit a snag that I haven't had time to debug, but I'm hoping to have it done by Tuesday. Comment Actions Turns out the snag was that I was loading the wrong version of the module. I have committed it now (r316982). If anyone is interested, I have a version that includes test vectors and runs self-tests when loaded, but I removed them from the final version as they are about six times larger than the actual code. Comment Actions Next time, give me more time than the few hours that you did, and give me a chance to review your code, which for a couple of reasons is unsuitable. This can be fixed, but your "under-my-feet" commit was MOST unwelcome. |