Page MenuHomeFreeBSD

random(4): Block read_random(9) on initial seeding
ClosedPublic

Authored by cem on Mar 29 2019, 3:38 AM.
Tags
None
Referenced Files
F108099253: D19744.id56224.diff
Tue, Jan 21, 9:19 AM
Unknown Object (File)
Sat, Jan 18, 9:44 PM
Unknown Object (File)
Fri, Jan 17, 1:57 PM
Unknown Object (File)
Dec 8 2024, 11:03 AM
Unknown Object (File)
Dec 6 2024, 2:12 PM
Unknown Object (File)
Nov 27 2024, 1:53 AM
Unknown Object (File)
Nov 1 2024, 11:27 PM
Unknown Object (File)
Oct 23 2024, 10:24 PM

Details

Summary

read_random() is used, mostly without error checking, in a lot of very
sensitive places in the kernel -- including seeding the widely used
arc4random(9)!

Most uses, especially arc4random(9), should block until the device is seeded
rather than proceeding with a bogus or empty seed. I did not spy any
obvious kernel consumers where blocking would be inappropriate. In some
instances, I replaced raw read_random(9) invocations with arc4random_buf(9)
or similar.

One caveat of this change is that the kern.arandom sysctl no longer returns
zero bytes immediately if the random device is not seeded. This means that
FreeBSD-specific userspace applications which attempted to handle an
unseeded random device may be broken by this change. If such behavior is
needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK
option.

On any typical FreeBSD system, entropy is persisted on read/write media and
used to seed the random device very early in boot, and blocking is never a
problem anyway.

This change primarily impacts the behavior of /dev/random on embedded
systems with read-only media. We toggle the default from 'charge on blindly
with no entropy' to 'block indefinitely.' This default is the safer, but
less may cause frustration. Embedded system designers using FreeBSD who
cannot provide a writable media for entropy persistence and re-seeding are
encouraged to design their own initial seeding and/or key generation delay
scheme to fit their specific needs. Early entropy can be fed from any
loader or by early userspace writing to the /dev/random device (as root).
(For those who prefer a weak system random device, it can be fake-started by
dding /dev/zero into /dev/random. This is strongly discouraged.)

I attempted to document this change in random.4 and random.9 and ran into a
bunch of out-of-date or irrelevant or inaccurate content and ended up
rototilling those documents more than I intended to. Sorry. I think
they're in a better state now.

PR: 230875

Test Plan

Tinderbox seems to build (kernels only). One topic suggested in the PR was a
tunable or whatever knob to block initial seeding to validate that early boot
did not depend on the seeded random device (for embedded systems). I have not
done that yet.

This will need secteam approval to commit, but as they are usually busy, I'll
wait for some positive feedback from this batch of reviewers before requesting
their approval.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

+CC kp for sanity check on pf changes and ae for the same on ipsec.

(Docs, you were CC'd for the manual page changes, of course. igor report is clean.)

I think you should separate most of the read_random() -> arc4random_buf() change out because in most cases the code should use the latter exclusively, especially places where the return value of read_random() were not tested because it would be a strict improvement to the status quo.

(Note that after a quick glance, I don't particularly like the approach you have took in sys/libkern/arc4random.c, as it would make arc4random_* blocking (through new read_random semantics), which is something that we tried to avoid for good reasons, but I need some time to think this through and I'm busy in the upcoming week or two; ping me if there is no movement here in a week).

I think you should separate most of the read_random() -> arc4random_buf() change out because in most cases the code should use the latter exclusively, especially places where the return value of read_random() were not tested because it would be a strict improvement to the status quo.

Ok.

(Note that after a quick glance, I don't particularly like the approach you have took in sys/libkern/arc4random.c, as it would make arc4random_* blocking (through new read_random semantics), which is something that we tried to avoid for good reasons, but I need some time to think this through and I'm busy in the upcoming week or two; ping me if there is no movement here in a week).

I don't think there is any good reason to unblock arc4random APIs (which cannot report unseeded state anyway) when we know we aren't seeded. If you're concerned about the locking / deadlock semantics for consumers, perhaps we could expose a KPI to check if the random device is seeded or not? It is relatively easy to shim read_random_uio to suit this purpose:

static bool
is_random_seeded(void)
{
    struct uio uio = { 0 };
    int rc;

    rc = read_random_uio(&uio, true);
    return (rc == 0);
}

(Of course, this has the ugly side effect of malloc/freeing a page of memory M_WAITOK when it is completely unneeded, and is an abuse of the interface. So my preference would be to add a new interface, if one is needed.)

Split out read_random -> arc4random consumer changes to a separate revision

cem removed reviewers: ae, kp.

(Dropping reviewers for the other review's changes.)

@delphij, have you had any more time to think about this since last week? Thanks!

I'm happy with this, in principle. I defer on the kernel innards, but the general engineering looks sound.

delphij requested changes to this revision.Apr 6 2019, 7:36 PM

Noticed a few minor issues, please see comment inline. Overall I think the change is good.

sys/dev/random/randomdev.c
128 ↗(On Diff #55614)

Instead of allowing the caller to specify unsupported flags and assert it later, it's better to just make this 'bool interruptable' and translate it to the only supported flag instead. Additionally you can probably name this something like: randomdev_waitseed_with_interruptable()?

132 ↗(On Diff #55614)

int sleep_flag = (interruptable)? PCATCH: 0;

143 ↗(On Diff #55614)

sleep_flag

146 ↗(On Diff #55614)

KASSERT(interruptable, ...

179 ↗(On Diff #55614)

(true)

234 ↗(On Diff #55614)

I'd probably rename this as "tail_buf" and keep the read_len name, see below.

248 ↗(On Diff #55614)

So basically this code block does was to read whole RANDOM_BLOCKSIZE's into the supplied buffer, and if the supplied 'len' is not multiple, the tail is filled with bytes from the on-stack buffer. This is a good change because it reduced the possibility that the data gets exposed through stack buffer, which is probably worth mentioning in the comments, by the way.

If I was you, I'd probably keep the existing read_len naming because it could be somewhat confusing what 'direct_len' means, though.

sys/sys/random.h
53 ↗(On Diff #55614)

Probably keep the return instead of panicking?

If the goal is to intentionally break !DEV_RANDOM case, you can remove this symbol or use #error here, it's better because it would become immediately visible for the user who builds it that way at compile time.

Personally, I think it's totally fine if they choose to not have any randomness, though, and if a case is intentionally broken, the opposite should be the default and option should be removed.

This revision now requires changes to proceed.Apr 6 2019, 7:36 PM
cem planned changes to this revision.Apr 6 2019, 8:08 PM
cem marked 4 inline comments as done.
cem added inline comments.
sys/dev/random/randomdev.c
128 ↗(On Diff #55614)

Instead of allowing the caller to specify unsupported flags and assert it later, it's better to just make this 'bool

I'm not sure that's generally true. It may be the case that this is the only flag we ever anticipate, and in that case, bool would make sense.

In general, sometimes we end up adding additional flags to functions and it is useful to have the spare bits in an existing parameter. In the case where only some bits are defined, I think asserting valid input is a good thing.

I'm open to the argument that we'll never add additional flags to this function, and if you want to make that argument, I'm happy to convert to bool parameter. But I don't see that case made.

132 ↗(On Diff #55614)

Sure; this is just the obvious way to implement the same behavior with a bool parameter instead of flags. I understand the idea and how it would be done; what's missing is the rationale. :-)

248 ↗(On Diff #55614)

This is a good change because it reduced the possibility that the data gets exposed through stack buffer

The previous scheme also had the issue where a caller could induce kstack overflow by specifying arbitrarily large len. And this change avoids the extra copy as well. I can mention it in the commit message.

If stack exposure is still a (reduced) concern, perhaps we should explicit_bzero local_buf before returning?

I'd probably keep the existing read_len naming because it could be somewhat confusing what 'direct_len' means, though.

How about read_directly_len?

sys/sys/random.h
53 ↗(On Diff #55614)

While I don't understand the rationale for !DEV_RANDOM, I'm not intending to remove it without that understanding why it exists and who needs it.

In such a build, read_random() (with new API specification) should not be invoked at all. One alternative to immediate panic would be to block forever (since random is never seeded). That doesn't seem like a better option to me.

cem marked 3 inline comments as done.

Clarify some of the variables names in READ_RANDOM, including reducing the
scope of the extra block and clearing it safely, if needed. Update the commit
message as well (not reflected in Phabricator).

Hi @delphij ,

I know you're busy, but any more thoughts? If you would strongly prefer to use bool instead of flags, I am happy to make that change. Are there any remaining issues you have with the change that I missed? I guess the other open question is what to do for !DEV_RANDOM. If you would prefer to leave that as-is, I can remove that portion of the change.

Thanks! Please just let me know your thoughts when you can.

Best,
Conrad

sys/dev/random/randomdev.c
248 ↗(On Diff #55614)

Sounds good, thanks!

sys/sys/random.h
53 ↗(On Diff #55614)

There is no existing mechanism in the kernel that prevents calling of read_random(), which is called by arc4random() and is used widely in various parts of kernel, including the TCP/IP stack and several file systems. By removing the symbol you would at least save other developers some time because that would give them a link time error vs seeing the system crash at random times.

The option was introduced in rS286839 by the way, I doubt it's actually being used by anyone in practice, and since your change would break it it's probably time to just go ahead and remove it altogether. @markm can you confirm?

Against my better judgement, I kept the !DEV_RANDOM case as some folks insisted on being able to use preferred tools (ssh) even on insecure embedded hardware. I'm happy to se you fix it, if it means I don't get the flak ]:->

M

cem planned changes to this revision.Apr 11 2019, 3:38 PM

The option was introduced in rS286839 by the way, I doubt it's actually being used by anyone in practice, and since your change would break it it's probably time to just go ahead and remove it altogether.

Against my better judgement, I kept the !DEV_RANDOM case as some folks insisted on being able to use preferred tools (ssh) even on insecure embedded hardware. I'm happy to se you fix it, if it means I don't get the flak ]:->

Ok, let's plan on that course of action as a next step after this change. I think we all concur and I'm happy to redirect blame from markm for removing it. I'll drop the !DEV_RANDOM changes here, and write a 2nd patch. Thanks!

cem marked 3 inline comments as done.
  • Replace slpflags parameter to wait_until_seeded with boolean parameter. I still prefer a named parameter at consumers, so I made some file-local defines for true and false that show intent more clearly (I hope).
  • Dropped panic() from !DEV_RANDOM empty read_random implementation. We will address !DEV_RANDOM separately.
  • Added sysctl/tunable to enable testing of unavailability of random device throughout early boot, and documented it in random.4.
  • Bumped .Dd on manual pages.

The code changes looks good to me -- and thanks for working on this!

The inline comments about documentation are optional.

share/man/man4/random.4
139 ↗(On Diff #56205)

Note that there is no mention of random(4) in drand48, nor it supports something like srandomdev, is it still worth mentioning here now that the reference is gone?

142 ↗(On Diff #56205)

This and RAND_bytes didn't really mention random(4), and they can probably be removed because the manual page is no longer talking about their implementation details.

This revision is now accepted and ready to land.Apr 15 2019, 7:09 AM
share/man/man4/random.4
139 ↗(On Diff #56205)

Good catch, thank you. I will fix before commit.

142 ↗(On Diff #56205)

Another good catch, thanks. I will also fix it. Thanks!

This revision was automatically updated to reflect the committed changes.