Changeset View
Standalone View
sys/dev/nvme/nvme_qpair.c
Show First 20 Lines • Show All 577 Lines • ▼ Show 20 Lines | if (qpair->cq_head == qpair->num_entries) { | ||||
*/ | */ | ||||
cpl = qpair->cpl[qpair->num_entries - 1]; | cpl = qpair->cpl[qpair->num_entries - 1]; | ||||
nvme_completion_swapbytes(&cpl); | nvme_completion_swapbytes(&cpl); | ||||
qpair->phase = !NVME_STATUS_GET_P(cpl.status); | qpair->phase = !NVME_STATUS_GET_P(cpl.status); | ||||
} | } | ||||
} | } | ||||
while (1) { | while (1) { | ||||
cpl = qpair->cpl[qpair->cq_head]; | cpl = qpair->cpl[qpair->cq_head]; | ||||
imp: This code seems to assume a copy from low address to high address. The phase in the | |||||
impUnsubmitted Not Done Inline ActionsThis comment was accidentally left. I typed it up and realized in the middle it was bogus and thought I'd hit cancel. imp: This comment was accidentally left. I typed it up and realized in the middle it was bogus and… | |||||
/* Convert to host endian */ | /* Convert to host endian */ | ||||
nvme_completion_swapbytes(&cpl); | nvme_completion_swapbytes(&cpl); | ||||
if (NVME_STATUS_GET_P(cpl.status) != qpair->phase) | if (NVME_STATUS_GET_P(cpl.status) != qpair->phase) | ||||
break; | break; | ||||
/* | |||||
* Re-sync and re-read now we know the completion has the right | |||||
* phase, as we need to ensure our reads of cid and sqhd happen | |||||
* after the read of status in order to not see stale values | |||||
* from before the completion queue entry was filled. | |||||
*/ | |||||
bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map, | |||||
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); | |||||
impUnsubmitted Not Done Inline ActionsWhy wouldn't it suffice to do the sync before the first read? imp: Why wouldn't it suffice to do the sync before the first read? | |||||
jrtc27AuthorUnsubmitted Done Inline ActionsThere are two cases:
Pictorially (ignoring fields other than CID+STATUS): Host Drive Write A.CID | V Write A.STATUS | V +--- Send IRQ | | Receive IRQ <--+ | | | V | Sync | | | V | Read A.CID | | | V | Read A.STATUS | | | V | Process | | | V | Read B.CID | | V | Write B.CID | | | V | Write B.STATUS V Read B.STATUS is currently possible due to the lack of ordering between status and cid reads. jrtc27: There are two cases:
1. If you bounce, then the sync is not guaranteed to see a consistent… | |||||
impUnsubmitted Not Done Inline ActionsOh, the race here is the device overtaking the host in updating the memory. So host reads the old record for a few bytes. The device starts to update the record and writes the status before the host reads it. The host sees the new status in the right phase and bogusly uses the old values. This code ensures that won't happen. Why does this happen on riscv but not x86? imp: Oh, the race here is the device overtaking the host in updating the memory.
So host reads the… | |||||
jrtc27AuthorUnsubmitted Done Inline ActionsOn x86/arm/arm64 both GCC and Clang are happy using unaligned loads to copy packed structs (which all the nvme ones are), but on riscv it will use byte loads as unaligned loads, whilst supported, may be emulated by firmware, causing performance to tank. Since cid and status are in the same 32-bit word, and in reality the struct _is_ aligned, this means it happens to always be atomic on x86/arm/arm64 so you never see cid and status be inconsistent (e.g. disassembling a 12.2-RELEASE-p1 amd64 kernel I can see it just doing to movq loads and two movq stores). jrtc27: On x86/arm/arm64 both GCC and Clang are happy using unaligned loads to copy packed structs… | |||||
impUnsubmitted Not Done Inline ActionsWay back in the armv4 days, we added a bunch of aligned(4) or aligned(8) to in-memory host structures to get it to load things with the right instructions. What happens if you try that on the completion structure? I had half a mind to check the status only, and if it's good, then read the whole CPL record too, but I worry about cache lines in ways my mind is too tired to grok... Checking the standard, the completion queue itself is page (4k) aligned. And each queue entry is 16 byte aligned, so __aligned(16) may be appropriate (unless the arg is power of 2) imp: Way back in the armv4 days, we added a bunch of __aligned(4) or __aligned(8) to in-memory host… | |||||
jrtc27AuthorUnsubmitted Done Inline ActionsYou'd be able to get away with it on 64-bit architectures, but it's not just cid we need, we also need sqhd, which is in a different 32-bit word, though the same 64-bit word. Marking it as aligned is a good idea anyway to avoid the horrendous codegen currently seen on riscv, but it's not a complete fix. If this weren't using busdma I'd just use an atomic_load_acq_16 on status and that'd be that... jrtc27: You'd be able to get away with it on 64-bit architectures, but it's not just cid we need, we… | |||||
cpl = qpair->cpl[qpair->cq_head]; | |||||
/* Convert to host endian */ | |||||
nvme_completion_swapbytes(&cpl); | |||||
tr = qpair->act_tr[cpl.cid]; | tr = qpair->act_tr[cpl.cid]; | ||||
if (tr != NULL) { | if (tr != NULL) { | ||||
nvme_qpair_complete_tracker(tr, &cpl, ERROR_PRINT_ALL); | nvme_qpair_complete_tracker(tr, &cpl, ERROR_PRINT_ALL); | ||||
qpair->sq_head = cpl.sqhd; | qpair->sq_head = cpl.sqhd; | ||||
done++; | done++; | ||||
} else if (!in_panic) { | } else if (!in_panic) { | ||||
/* | /* | ||||
▲ Show 20 Lines • Show All 693 Lines • Show Last 20 Lines |
This code seems to assume a copy from low address to high address. The phase in the