Page MenuHomeFreeBSD

bhyve: implement NVMe deallocate command
ClosedPublic

Authored by chuck on Sep 29 2019, 3:29 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 9, 3:52 AM
Unknown Object (File)
Dec 4 2024, 2:59 AM
Unknown Object (File)
Nov 28 2024, 7:49 PM
Unknown Object (File)
Nov 22 2024, 11:00 PM
Unknown Object (File)
Nov 20 2024, 9:00 AM
Unknown Object (File)
Nov 18 2024, 4:58 PM
Unknown Object (File)
Nov 3 2024, 7:29 PM
Unknown Object (File)
Oct 28 2024, 10:59 AM

Details

Summary

This adds support for the Dataset Management (DSM) command to the NVMe
emulation in general, and more specifically, for the deallocate
attribute (a.k.a. trim in the ATA protocol). If the backing storage for
the namespace supports delete (i.e. deallocate), setting the deallocate
attribute in a DSM will trim/delete the requested LBA ranges in the
underlying storage.

Stacked with changes in D21837 and D21838

Test Plan

Tested on Alpine Linux guest:

  • mkfs.xfs /dev/nvme0n1
  • nvme dsm /dev/nvme0n1 --ad --slbs=0,4 --blocks=1,1

mkfs only tests a single range at a time but tests large ranges (i.e. length in logical blocks)
The nvme-cli test exercises the multiple ranges aspect of the command.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I can't speak to the NVMe bits, but from what I can understand on the bhyve side it looks ok to me.

usr.sbin/bhyve/pci_nvme.c
1401 ↗(On Diff #62710)

So this uses a length of 4096, would it make sense to static_assert(256 * sizeof(struct nvme_dsm_range) == 4096)? It looks kind of odd to use two different constants for the length.

1420 ↗(On Diff #62710)

You can just free() unconditionally as free(NULL) is well-defined to be a nop.

This revision is now accepted and ready to land.Oct 10 2019, 4:35 PM
chuck added inline comments.
usr.sbin/bhyve/pci_nvme.c
1401 ↗(On Diff #62710)

You're right, that does look weird. After poking around a bit, I see that the NVMe driver defines NVME_MAX_DSM_TRIM as 4096. So I'll change the code to:

/* copy locally because a range entry could straddle PRPs */
data = calloc(1, NVME_MAX_DSM_TRIM);
if (data == NULL) {
	pci_nvme_status_genc(&status, NVME_SC_INTERNAL_DEVICE_ERROR);
	goto out;
}
nvme_prp_memcpy(sc->nsc_pi->pi_vmctx, cmd->prp1, cmd->prp2,
    data, NVME_MAX_DSM_TRIM, NVME_COPY_FROM_PRP);
1420 ↗(On Diff #62710)

Oops. Forgot that FreeBSD is sane. Fixed

chuck edited the test plan for this revision. (Show Details)

Version 2 of this patch addresses review feedback plus fixes a bug discovered during testing a Linux guest.

The main change is converting the command processing from blocking in the command processing loop to working asynchronously (similar to Read and Write).

This revision now requires review to proceed.Mar 15 2020, 11:35 PM
khng added inline comments.
usr.sbin/bhyve/pci_nvme.c
1417 ↗(On Diff #69556)

When testing with FreeBSD 12.1 guests a lot of error messages (indicating INVALID_FIELD) were emitted when plain file was used instead of geom disk.
It seems better to either not present DSM support in the first place, or make it an no-op according to the following statements:
"The values read from a deallocated LBAand its metadata (excluding protection information)shall be all zeros, all ones, or the last data written to the associated LBAand its metadata."
in https://nvmexpress.org/wp-content/uploads/NVM_Express_1_2_Gold_20141209.pdf

usr.sbin/bhyve/pci_nvme.c
1417 ↗(On Diff #69556)

It seems better to either not present DSM support in the first place if blockif indicates that

blockif_candelete()

is false, or make it an no-op according to the following statements: *

usr.sbin/bhyve/pci_nvme.c
1417 ↗(On Diff #69556)

Let me check from a compliance point of view what is correct and then decide what to do. Not advertising DSM might be a good option, but DSM is more than just deallocate.

Version 3 of patch adds the following:

  • by default the Controller will advertise DSM support only if the backing storage also supports delete/trim
  • new command line nvme option "dsm" which takes values "auto" (default behavior), "enable" (unconditionally advertise DSM), and "disable" (never advertise DSM)
  • if the controller advertises DSM but the backing storage does not support delete/trim, follow the standards group recommendation to return "Success" to all DSM commands
This revision is now accepted and ready to land.Mar 24 2020, 7:42 PM
This revision was automatically updated to reflect the committed changes.