Page MenuHomeFreeBSD

random(4): Squash non-error timeout codes from tsleep(9)
ClosedPublic

Authored by cem on Sep 5 2018, 3:33 PM.

Details

Summary

As reported on -fs@ and -current@ by lev@, mdmfs(8) -> newfs(8) was aborting
due to arc4random(3) sanity check failure when a diskless FreeBSD system
transitioned from unseeded to seeded randomdev state.

Context: userspace arc4random(3), now based on the Chacha stream cipher
(thanks delphij@), seeds itself by requesting a small amount of random data
with the getentropy(3) API, a wrapper around getrandom(2). getrandom(2) is
itself a wrapper around READ_RANDOM_UIO(9), the same routine backing read(2)
of the /dev/random device.

The root of the problem observed by lev@ was that the tsleep() in
READ_RANDOM_UIO set error to EWOULDBLOCK and short-circuited the rest of the
operation. This bubbled up to userspace getentropy(3), which perhaps
mistakenly (getentropy(3) is only allowed to return EIO or EFAULT) raised
the error further to arc4random(3), which then suicided the process.

delphij@ spotted a similar bug in READ_RANDOM_UIO's logic to check for
pending signals. This one is perhaps harder to hit on accident because it
requires requesting over 16 MiB from /dev/random or getrandom(2) in a single
read, which is atypical.

PR: 231181

Test Plan

First bug: unblocking random fix:

# Block (D17047):
$ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)'
debug.fail_point.random_fortuna_pre_read: off -> return(1)
$ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)'
debug.fail_point.random_fortuna_seeded: off -> return(1)

# Start minimal test program that used arc4random(3) and printed result:
$ [truss] ./blocked_random_poc
...

# Unblock:
$ sudo sysctl debug.fail_point.random_fortuna_pre_read='off'
debug.fail_point.random_fortuna_pre_read: return(1) -> off
$ sudo sysctl debug.fail_point.random_fortuna_seeded=off
debug.fail_point.random_fortuna_seeded: return(1) -> off

# Correct
...
abcdef12
(exit 0)

# Incorrect
...
getrandom(0x7fffffffd340,40,0)                   ERR#35 'Resource temporarily unavailable'
thr_self(0x7fffffffd310)                         = 0 (0x0)
thr_kill(100609,SIGKILL)                         = 0 (0x0)
SIGNAL 9 (SIGKILL) code=SI_NOINFO

Second similar bug pointed out by Xin Li can be reproduced quite easily once you know to look for it:

$ cat ../large_devrandom_poc.c
#include <sys/random.h>
...
static char buf[16*1024*1024 + 1];  // SIGCHK_PERIOD + 1

int
main(int argc, char **argv)
{
        uint64_t t;
        int rc;

        rc = getrandom(buf, sizeof(buf), 0);
        if (rc < 0)
                printf("getrandom: err %d\n", errno);
        else {
                memcpy(&t, buf, sizeof(t));
                printf("%lx\n", t);
        }
        return (0);
}
$ cc ...
$ ../large_devrandom_poc
getrandom: err 35

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 19419
Build 19016: arc lint + arc unit

Event Timeline

cem created this revision.Sep 5 2018, 3:33 PM
cem edited the test plan for this revision. (Show Details)Sep 5 2018, 3:44 PM
vangyzen accepted this revision.Sep 5 2018, 3:58 PM
This revision is now accepted and ready to land.Sep 5 2018, 3:58 PM
delphij accepted this revision.

LGTM, thanks!

cem edited the summary of this revision. (Show Details)Sep 5 2018, 4:59 PM
cem added a reviewer: releng.
lev added a comment.Sep 5 2018, 5:31 PM

This change helps me on real hardware I've had this problem.
Thank you!

markm accepted this revision.Sep 7 2018, 6:08 PM
This revision was automatically updated to reflect the committed changes.