This adds support for the Dataset Management (DSM) command to the NVMe
emulation in general, and more specifically, for the deallocate
attribute (a.k.a. trim in the ATA protocol). If the backing storage for
the namespace supports delete (i.e. deallocate), setting the deallocate
attribute in a DSM will trim/delete the requested LBA ranges in the
underlying storage.
Details
Tested on Alpine Linux guest:
- mkfs.xfs /dev/nvme0n1
- nvme dsm /dev/nvme0n1 --ad --slbs=0,4 --blocks=1,1
mkfs only tests a single range at a time but tests large ranges (i.e. length in logical blocks)
The nvme-cli test exercises the multiple ranges aspect of the command.
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
I can't speak to the NVMe bits, but from what I can understand on the bhyve side it looks ok to me.
usr.sbin/bhyve/pci_nvme.c | ||
---|---|---|
1401 ↗ | (On Diff #62710) | So this uses a length of 4096, would it make sense to static_assert(256 * sizeof(struct nvme_dsm_range) == 4096)? It looks kind of odd to use two different constants for the length. |
1420 ↗ | (On Diff #62710) | You can just free() unconditionally as free(NULL) is well-defined to be a nop. |
usr.sbin/bhyve/pci_nvme.c | ||
---|---|---|
1401 ↗ | (On Diff #62710) | You're right, that does look weird. After poking around a bit, I see that the NVMe driver defines NVME_MAX_DSM_TRIM as 4096. So I'll change the code to: /* copy locally because a range entry could straddle PRPs */ data = calloc(1, NVME_MAX_DSM_TRIM); if (data == NULL) { pci_nvme_status_genc(&status, NVME_SC_INTERNAL_DEVICE_ERROR); goto out; } nvme_prp_memcpy(sc->nsc_pi->pi_vmctx, cmd->prp1, cmd->prp2, data, NVME_MAX_DSM_TRIM, NVME_COPY_FROM_PRP); |
1420 ↗ | (On Diff #62710) | Oops. Forgot that FreeBSD is sane. Fixed |
Version 2 of this patch addresses review feedback plus fixes a bug discovered during testing a Linux guest.
The main change is converting the command processing from blocking in the command processing loop to working asynchronously (similar to Read and Write).
usr.sbin/bhyve/pci_nvme.c | ||
---|---|---|
1417 ↗ | (On Diff #69556) | When testing with FreeBSD 12.1 guests a lot of error messages (indicating INVALID_FIELD) were emitted when plain file was used instead of geom disk. |
usr.sbin/bhyve/pci_nvme.c | ||
---|---|---|
1417 ↗ | (On Diff #69556) | It seems better to either not present DSM support in the first place if blockif indicates that blockif_candelete() is false, or make it an no-op according to the following statements: * |
usr.sbin/bhyve/pci_nvme.c | ||
---|---|---|
1417 ↗ | (On Diff #69556) | Let me check from a compliance point of view what is correct and then decide what to do. Not advertising DSM might be a good option, but DSM is more than just deallocate. |
Version 3 of patch adds the following:
- by default the Controller will advertise DSM support only if the backing storage also supports delete/trim
- new command line nvme option "dsm" which takes values "auto" (default behavior), "enable" (unconditionally advertise DSM), and "disable" (never advertise DSM)
- if the controller advertises DSM but the backing storage does not support delete/trim, follow the standards group recommendation to return "Success" to all DSM commands