Page MenuHomeFreeBSD

virtio_random(4): Fix random(4) integration
ClosedPublic

Authored by cem on May 26 2019, 11:39 PM.

Details

Summary

random(4) masks unregistered entropy sources. Prior to this revision,
virtio_random(4) did not correctly register a random_source and did not
function as a source of entropy.

Random source registration for loadable pure sources requires registering a
poll callback, which is invoked periodically by random(4)'s harvestq
kthread. The periodic poll makes virtio_random(4)'s periodic entropy
collection redundant, so this revision removes the callout.

Add locking around concurrent access to the device; the virtqueue use
depends on the assumption that only a single reader is accessing the device
at a time.

Test Plan
testvm# sysctl kern.random.
kern.random.harvest.mask_symbolic: PURE_VIRTIO,PURE_RDRAND,[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
                                   ^^^^^^^^^^^^
kern.random.harvest.mask_bin: 000010010000000111011111
kern.random.harvest.mask: 590303
kern.random.random_sources: 'VirtIO Entropy Adapter','Intel Secure Key RNG'
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
...

(These are absent prior to this change.)

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

cem created this revision.May 26 2019, 11:39 PM
markm accepted this revision.May 28 2019, 7:18 AM

Good catch!

This revision is now accepted and ready to land.May 28 2019, 7:18 AM

It has been a long while since I wrote this driver but the callout pattern was certainly cribbed from another driver in the tree. Has this API changed over the years?

sys/dev/virtio/random/virtio_random.c
135 ↗(On Diff #57929)

While the virtqueue still uses the *mb(), I think there is a preference to use atomic_thread_fence_* instead.

When would g_vtrnd_softc not be null here? I guess related when would g_vtrnd_softc == sc be false in detach (expect when the attached failed path)

202 ↗(On Diff #57929)

Is this at risk of blowing the kernel stack?

215 ↗(On Diff #57929)

Lock is leaked here

cem added a comment.May 28 2019, 3:42 PM

It has been a long while since I wrote this driver but the callout pattern was certainly cribbed from another driver in the tree. Has this API changed over the years?

It's quite possible, yes. 2013 predates some API changes to random in ~2015 and some changes in 2018, although I think the masking for periodic-gather of pure sources was broken in 2014 or earlier (that's when hc_source_mask was introduced, at least in one incarnation). There was a big cleanup in 2015 (r284959) and 2014 (r273872) although so much changed, it's difficult for me to tell just poking at svnweb.

Alternatively, the callout pattern may have been for some "environmental" rather than "pure" entropy source, which would have always worked for it (but not virtio as a pure source).

sys/dev/virtio/random/virtio_random.c
135 ↗(On Diff #57929)

atomic_thread_fence_ is fine with me, I just have to figure out which one means "wmb".

The conditions are both guarding against the host exposing 2 or more virtio-random devices. The driver will happily attach to both, but we want to reference only the first one from the global pointer and take care to only remove it when the same device is removed / detached (possible with devctl(8)).

202 ↗(On Diff #57929)

No. In practice HARVESTSIZE == 2 and sz == 8.

In a future revision, I plan to remove value entirely, shift the alignment responsibility into the random(4) code, and output directly into buf. But anything that touches random(4) has to go through secteam, so I wanted to keep that change separate and as small as possible.

215 ↗(On Diff #57929)

Good catch, thanks!

cem updated this revision to Diff 58000.May 28 2019, 4:04 PM
cem marked 3 inline comments as done.
  • Use atomic_thread_fence_rel instead of wmb
  • Fix lock leak

thanks bryanv@!

This revision now requires review to proceed.May 28 2019, 4:04 PM

@markm: Can you explain the difference in the prior version of this driver that is callout based and this change? It seems there is at least one driver in the tree that uses the callout pattern.

sys/dev/virtio/random/virtio_random.c
135 ↗(On Diff #57929)

Ah, I have it in my mind that I wrote code to avoid multiple virtio_random devices from attaching.

In the context of this change I think it would more sense to actually add that: why allow the attachment devices that will not be used? And avoid potential unexpected behavior: if the active one is detached, the remaining attached devices won't take over.

249 ↗(On Diff #58000)

While you're here can you handle a short read? I wrote this driver against the VirtIO v0.95 spec which only states:

When the driver requires random bytes, it places the descriptor of one or more buffers in the queue. It will be completely filled by random data by the device.

However, the v1.0 spec contains the previous quote but then added:

The driver MUST examine the length written by the device to determine how many random bytes were received.
...
The device MUST place one or more random bytes into the buffer, but it MAY use less than the entire buffer length.

I'm not quite sure how to parse this but handling short reads seems like a safe thing to do.

cem planned changes to this revision.May 28 2019, 9:06 PM

Hey Bryan,

I really appreciate the feedback from someone so familiar with virtio!

It seems there is at least one driver in the tree that uses the callout pattern.

Can you point me at the other driver? I'd be interested in fixing it, if it is also broken. If it isn't broken, I can probably explain why not. Thanks!

sys/dev/virtio/random/virtio_random.c
135 ↗(On Diff #57929)

It doesn't fit the usual driver pattern to make policy decisions about limiting multiple instances of the same driver from attaching, if the backing devices are present. Sure, attaching a 2nd random device doesn't do much today (other than show up in devinfo); there's also little cost (IMO).

I think the unexpected behavior scenario above is so unlikely as to not be worth worrying about. The first unlikely thing that has to happen is for a hypervisor to expose >1 virtio-random devices to the guest. The second unlikely thing that has to happen is for the guest to manually detach one, but not all, of the virtio-rand devices. And the 3rd perhaps unlikely step is that a user who has created such a scenario would expect the remaining instances to take over random source responsibility.

I have some thoughts about how we might improve the random source API for modules such that they don't have to worry about these details, one way or the other. I'd prefer to leave that concern/policy to random(4), instead of spreading it across N drivers. In particular, it'd be nice if it was actually safe to unregister a random source at runtime.

249 ↗(On Diff #58000)

I'm unfamiliar with the virtqueue API; is the right way to determine read length from the device something like:

uint32_t rdlen;

virtqueue_poll(vq, &rdlen);

Thanks! I'll go ahead and work on an updated draft of this patch.

cem updated this revision to Diff 58014.May 28 2019, 10:03 PM
cem marked an inline comment as done.
  • Handle short reads.
bryanv added a comment.Jun 1 2019, 6:49 PM

Other drivers following the callout pattern seem to be: bcm2835_rng.c, octeon_rnd.c, hifn7751.c, glxsb.c, tpm20.c although perhaps there is a desired functional difference that I am not aware of.

sys/dev/virtio/random/virtio_random.c
135 ↗(On Diff #57929)

I think a reasonable expected behavior would be all attached devices are used as sources. I agree that detaching one is unlikely but I guess I'd prefer to be cautious around anything security'ish related.

cem marked an inline comment as done.Jun 1 2019, 10:02 PM

Other drivers following the callout pattern seem to be: bcm2835_rng.c, octeon_rnd.c, hifn7751.c, glxsb.c, tpm20.c although perhaps there is a desired functional difference that I am not aware of.

I don't think you're missing anything; they appear to all be broken :-(.

sys/dev/virtio/random/virtio_random.c
135 ↗(On Diff #57929)

I think a reasonable expected behavior would be all attached devices are used as sources.

I agree in general, with some nuance. In this particular case, where the device only exists to provide entropy, I totally agree and will plan to make the change you proposed earlier.

The nuance bit is: I think that we might reasonably fail to meet this expectation for some devices under the current source-registration model. There exist drivers which are both a random source and provide some other functionality. For those specific drivers, it may make sense to attach even if the 2nd device will not be able to provide entropy to random(4) today.

A good redesign of the entropy registration model would allow all devices of the same "source" to input entropy sanely, and come and go without impacting other instances of the same device/source.

bryanv accepted this revision.Jun 2 2019, 2:48 AM
This revision is now accepted and ready to land.Jun 2 2019, 2:48 AM
cem updated this revision to Diff 58182.Jun 2 2019, 10:45 PM
cem marked an inline comment as done.
  • Only allow first virtio_random device to attach
This revision now requires review to proceed.Jun 2 2019, 10:45 PM
cem updated this revision to Diff 58183.Jun 2 2019, 10:47 PM
cem marked an inline comment as done.
  • Drop unneeded headers
markm accepted this revision.Jun 3 2019, 5:29 PM
This revision is now accepted and ready to land.Jun 3 2019, 5:29 PM
This revision was automatically updated to reflect the committed changes.