Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv, and the same on all other
platforms.
Details
This came from discussions with Jessica about issues on riscv, with a proposed fix in D31002 (Jessica also has one that has more background, which I've linked from that mine)
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 40235 Build 37124: arc lint + arc unit
Event Timeline
I agree that __packed changes nothing there and is not needed. But what is the point of adding explicit alignment? I've quickly looked through the use of the structures and found nothing that this would change.
I was thinking for the nvme_completion we could read the completion record more efficiently, though maybe aligned(8) is sufficient for it to always use a movq. Lemme check code generation, if there's no advantage, I'll remove it.
For nvme_dsm_range, I have no good argument for that, so I'll remove it.
Just to document, there was a similar set of proposed changes for bhyve in D29126
I had a concern about the change, but @jhb explained __packed was more accurately "alignment of 1" and not really what the code wanted. He also mentioned that the sizeof asserts should catch any compiler shenanigans.
I agree with @mav that the __aligned(16) probably doesn't do anything especially since CQ base addresses must MPS aligned.
I've looked at the generated code for an artificial example.
#include <sys/cdefs.h> #include <stdint.h> struct gerbil { uint32_t w1; uint32_t w2; uint32_t w3; uint32_t w4; }; struct hamster { uint32_t w1; uint32_t w2; uint32_t w3; uint32_t w4; } __aligned(8); extern void doit(void *); void copy_g(struct gerbil *g) { struct gerbil gg = *g; if (gg.w4) doit(&gg); } void copy_h(struct hamster *h) { struct hamster hh = *h; if (hh.w4) doit(&hh); }
For i386, aarch64, amd64, powerpc, and powerpc64 the code for copy_h and copy_g are identical. They don't care about the actual alignment, at least with the latest clang in the tree. I didn't try mips* or 32-bit arm because we don't support nvme there at all.
For riscv64, copy_h has significantly better code because it assumes super alignment (8), while copy_g does not. Changing it to alignment(4) causes the code generated to be the same, I presume because of the riscv64 ABI dictating this alignment.
I selected the above code because it's the bit of code racing in another commit, abstracted down to something easy to examine.
What about nvme_registers? I don't think we actually use it for anything where it'd matter (the regs field is only ever assigned to, the actual reads/writes use offsetot and bus_space) but for completeness we probably should change that too.
jrtc27 pointed out we don't need nvme_registers __packed either.
It matters a lot less, but there's no readson to do it.