Page MenuHomeFreeBSD

bhyve nvme: Fix out-of-bound IOV array access
ClosedPublic

Authored by chuck on May 26 2022, 5:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Apr 20, 5:53 AM
Unknown Object (File)
Mar 21 2024, 2:46 PM
Unknown Object (File)
Mar 19 2024, 8:54 PM
Unknown Object (File)
Feb 11 2024, 6:00 AM
Unknown Object (File)
Jan 12 2024, 7:18 AM
Unknown Object (File)
Dec 20 2023, 7:32 AM
Unknown Object (File)
Dec 12 2023, 3:37 PM
Unknown Object (File)
Dec 5 2023, 8:31 PM

Details

Summary

NVMe operations indicate the memory region(s) associated with a command
via physical region pages (PRPs). Since each PRP has a fixed size,
contiguous memory regions larger than the PRP size require multiple PRP
entries.

Instead of issuing a blockif call for each PRP, the NVMe emulation
concatenates multiple contiguous PRP entries into a single blockif
request. The test for contiguous regions has a bug such that it
mistakenly treats an initial address of zero as a contiguous range and
concatenates it with the previous. But because there is no previous IOV,
the concatenation code corrupts the IO request structure and leads to a
segmentation fault when the blockif request completes.

Fix is to test for the existence of a previous range before trying to
concatenate the current range with the previous one.

PR: 264177
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 2 weeks

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

chuck requested review of this revision.May 26 2022, 5:57 PM
  1. I think patch is good, however I have questions (not directly related to the patch) for lba naming. LBA - logical block address - it should be enumerated with blocks, i.e. 1,.... However pci_nvme_append_iov_req() is called from nvme_write_read_blockif() and offset is passed as lba.

Do you think it is worth renaming lba to offset?

  1. And one more suggestion, what if code will be:
	if (req->io_req.br_iovcnt > 0 && (req->prev_gpaddr + req->prev_size) == gpaddr) {
  1. Question related to description.
 The test for contiguous regions has a bug such that it
mistakenly treats an initial address of zero as a contiguous range and
concatenates it with the previous. But because there is no previous IOV,
the concatenation code corrupts the IO request structure and leads to a
segmentation fault when the blockif request completes.

As I understood the first call of pci_nvme_append_iov_req() is always with req->io_req.br_iovcnt == 0, so this concatenation should be involved only for the subsequent calls of pci_nvme_append_iov_req() and it would good to point to it.

  1. I think patch is good, however I have questions (not directly related to the patch) for lba naming. LBA - logical block address - it should be enumerated with blocks, i.e. 1,.... However pci_nvme_append_iov_req() is called from nvme_write_read_blockif() and offset is passed as lba.

Do you think it is worth renaming lba to offset?

Agreed, that is a change worth making

  1. And one more suggestion, what if code will be:
	if (req->io_req.br_iovcnt > 0 && (req->prev_gpaddr + req->prev_size) == gpaddr) {

What you posted is very close to the original change I made. My hope was the intent of this version was clearer and compiled down to roughly the same assembly code.

  1. Question related to description.
 The test for contiguous regions has a bug such that it
mistakenly treats an initial address of zero as a contiguous range and
concatenates it with the previous. But because there is no previous IOV,
the concatenation code corrupts the IO request structure and leads to a
segmentation fault when the blockif request completes.

As I understood the first call of pci_nvme_append_iov_req() is always with req->io_req.br_iovcnt == 0, so this concatenation should be involved only for the subsequent calls of pci_nvme_append_iov_req() and it would good to point to it.

I had hoped the last sentence covered this, but it sounds like it missed. Is

Fix is to always skip concatenation for the initial address range.

any better? If not, I'm open to rewording :)

Implement suggested rename of variable

I had read 'initial address of zero' as meaning that the first iovec had iov_base of 0. Saying something like "the first vector" probably is a bit less ambiguous.

This revision is now accepted and ready to land.May 27 2022, 4:59 PM

What you posted is very close to the original change I made. My hope was the intent of this version was clearer and compiled down to roughly the same assembly code.

Yes, it has the same result. However splitting condition with range_is_contiguous makes perception that case with br_iovcnt == 0 is something different and is not related, whereas using

(req->io_req.br_iovcnt > 0 && (req->prev_gpaddr + req->prev_size) == gpaddr)

produces right understanding the second sub-condition is valid only then first sub-condition is right.

However to make it more clear:

if (req->prev_size != 0 && (req->prev_gpaddr + req->prev_size) == gpaddr)

It means if prev_size is zero, that hasn't been used yet.

In D35328#801068, @jhb wrote:

I had read 'initial address of zero' as meaning that the first iovec had iov_base of 0. Saying something like "the first vector" probably is a bit less ambiguous.

No, you read it correctly. When the iov_base is 0, it erroneously matches the previous address (0) and size (also 0).

This change looks good to me. Is there anything stopping this being merged?
(I'm doing another sync to illumos at the end of the month and I'd like to include this if possible)

This change looks good to me. Is there anything stopping this being merged?
(I'm doing another sync to illumos at the end of the month and I'd like to include this if possible)

I would remove "range_is_contiguous" usage and unite all in one condition as "contiguous" case is only if both conditions are true:

(req->prev_size != 0 && (req->prev_gpaddr + req->prev_size) == gpaddr)

This change looks good to me. Is there anything stopping this being merged?
(I'm doing another sync to illumos at the end of the month and I'd like to include this if possible)

There are a batch of these NVMe changes in the queue, but my testing has tripped over a failure causing a bhyve core dump. I don't believe the changes are the cause, but they seem to exacerbate whatever this issue is.

I'll try to focus on the bug to get this change into your sync window.

! In D35328#811978, @gusev.vitaliy_gmail.com wrote:

I would remove "range_is_contiguous" usage and unite all in one condition as "contiguous" case is only if both conditions are true:

(req->prev_size != 0 && (req->prev_gpaddr + req->prev_size) == gpaddr)

We have a difference in opinion here that I'm not sure how to reconcile. The checks are equivalent, but I feel a) the additional variable name makes the intent obvious and b) splitting the check makes the condition that caused the bug stand out. Your suggestion is more concise, but does it have other advantages perhaps I'm missing?

This revision was automatically updated to reflect the committed changes.