Page MenuHomeFreeBSD

random(4): Squash non-error timeout codes from tsleep(9)
ClosedPublic

Authored by cem on Sep 5 2018, 3:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 28, 5:15 PM
Unknown Object (File)
Dec 20 2024, 3:28 AM
Unknown Object (File)
Nov 21 2024, 2:41 PM
Unknown Object (File)
Nov 13 2024, 9:17 PM
Unknown Object (File)
Nov 10 2024, 9:25 PM
Unknown Object (File)
Nov 4 2024, 1:21 PM
Unknown Object (File)
Nov 2 2024, 3:10 AM
Unknown Object (File)
Nov 2 2024, 3:10 AM
Subscribers

Details

Summary

As reported on -fs@ and -current@ by lev@, mdmfs(8) -> newfs(8) was aborting
due to arc4random(3) sanity check failure when a diskless FreeBSD system
transitioned from unseeded to seeded randomdev state.

Context: userspace arc4random(3), now based on the Chacha stream cipher
(thanks delphij@), seeds itself by requesting a small amount of random data
with the getentropy(3) API, a wrapper around getrandom(2). getrandom(2) is
itself a wrapper around READ_RANDOM_UIO(9), the same routine backing read(2)
of the /dev/random device.

The root of the problem observed by lev@ was that the tsleep() in
READ_RANDOM_UIO set error to EWOULDBLOCK and short-circuited the rest of the
operation. This bubbled up to userspace getentropy(3), which perhaps
mistakenly (getentropy(3) is only allowed to return EIO or EFAULT) raised
the error further to arc4random(3), which then suicided the process.

delphij@ spotted a similar bug in READ_RANDOM_UIO's logic to check for
pending signals. This one is perhaps harder to hit on accident because it
requires requesting over 16 MiB from /dev/random or getrandom(2) in a single
read, which is atypical.

PR: 231181

Test Plan

First bug: unblocking random fix:

# Block (D17047):
$ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)'
debug.fail_point.random_fortuna_pre_read: off -> return(1)
$ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)'
debug.fail_point.random_fortuna_seeded: off -> return(1)

# Start minimal test program that used arc4random(3) and printed result:
$ [truss] ./blocked_random_poc
...

# Unblock:
$ sudo sysctl debug.fail_point.random_fortuna_pre_read='off'
debug.fail_point.random_fortuna_pre_read: return(1) -> off
$ sudo sysctl debug.fail_point.random_fortuna_seeded=off
debug.fail_point.random_fortuna_seeded: return(1) -> off

# Correct
...
abcdef12
(exit 0)

# Incorrect
...
getrandom(0x7fffffffd340,40,0)                   ERR#35 'Resource temporarily unavailable'
thr_self(0x7fffffffd310)                         = 0 (0x0)
thr_kill(100609,SIGKILL)                         = 0 (0x0)
SIGNAL 9 (SIGKILL) code=SI_NOINFO

Second similar bug pointed out by Xin Li can be reproduced quite easily once you know to look for it:

$ cat ../large_devrandom_poc.c
#include <sys/random.h>
...
static char buf[16*1024*1024 + 1];  // SIGCHK_PERIOD + 1

int
main(int argc, char **argv)
{
        uint64_t t;
        int rc;

        rc = getrandom(buf, sizeof(buf), 0);
        if (rc < 0)
                printf("getrandom: err %d\n", errno);
        else {
                memcpy(&t, buf, sizeof(t));
                printf("%lx\n", t);
        }
        return (0);
}
$ cc ...
$ ../large_devrandom_poc
getrandom: err 35

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Sep 5 2018, 3:58 PM
delphij added a reviewer: secteam.

LGTM, thanks!

cem added a reviewer: releng.

This change helps me on real hardware I've had this problem.
Thank you!

This revision was automatically updated to reflect the committed changes.