Page MenuHomeFreeBSD

nvme: Add ability to override ioq to put the request on
ClosedPublic

Authored by imp on Wed, Jan 14, 4:06 PM.
Tags
None
Referenced Files
F142041259: D54714.id169707.diff
Thu, Jan 15, 8:22 AM
F142038819: D54714.id.diff
Thu, Jan 15, 7:25 AM
F142022486: D54714.id169735.diff
Thu, Jan 15, 1:43 AM
F142017591: D54714.id169735.diff
Wed, Jan 14, 11:47 PM
F142017548: D54714.id.diff
Wed, Jan 14, 11:46 PM
F142014959: D54714.diff
Wed, Jan 14, 11:00 PM
Unknown Object (File)
Wed, Jan 14, 4:11 PM
Subscribers

Details

Summary

Sometimes the client device needs to manage the IOQ the request goes
to. Expand the interface we have for the request to allow it to be set
for this special use case.

Suggested by: jhb
Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

imp requested review of this revision.Wed, Jan 14, 4:06 PM

Note: we could compress retries to 16 bits and have this be 16 bits too since the number of queues is limited to 64k if this busts the cache-line sizes of this struct.

Hmm, today on amd64 nvme_request is 128 bytes, so packing the ioq is probably worth doing. Note that you could just add the new field as a uint16_t in place of the two spare bools.

(gdb) ptype /o struct nvme_request
/* offset      |    size */  type = struct nvme_request {
/*      0      |      64 */    struct nvme_command {
/*      0      |       1 */        uint8_t opc;
/*      1      |       1 */        uint8_t fuse;
/*      2      |       2 */        uint16_t cid;
/*      4      |       4 */        uint32_t nsid;
/*      8      |       4 */        uint32_t rsvd2;
/*     12      |       4 */        uint32_t rsvd3;
/*     16      |       8 */        uint64_t mptr;
/*     24      |      16 */        union {
/*                    16 */            struct {
/*     24      |       8 */                uint64_t prp1;
/*     32      |       8 */                uint64_t prp2;

                                           /* total size (bytes):   16 */
                                       };
/*                    16 */            struct nvme_sgl_descriptor {
/*     24      |       8 */                uint64_t address;
/*     32      |       4 */                uint32_t length;
/*     36      |       3 */                uint8_t reserved[3];
/*     39      |       1 */                uint8_t type;

                                           /* total size (bytes):   16 */
                                       } sgl;
                                       /* total size (bytes):   16 */
                                   };
/*     40      |       4 */        uint32_t cdw10;
/*     44      |       4 */        uint32_t cdw11;
/*     48      |       4 */        uint32_t cdw12;
/*     52      |       4 */        uint32_t cdw13;
/*     56      |       4 */        uint32_t cdw14;
/*     60      |       4 */        uint32_t cdw15;

                                   /* total size (bytes):   64 */
                               } cmd;
/*     64      |       8 */    struct nvme_qpair *qpair;
/*     72      |      24 */    struct memdesc {
/*     72      |       8 */        union {
/*                     8 */            void *md_vaddr;
/*                     8 */            vm_paddr_t md_paddr;
/*                     8 */            struct bus_dma_segment *md_list;
/*                     8 */            struct uio *md_uio;
/*                     8 */            struct mbuf *md_mbuf;
/*                     8 */            struct vm_page **md_ma;

                                       /* total size (bytes):    8 */
                                   } u;
/*     80      |       8 */        union {
/*                     8 */            size_t md_len;
/*                     4 */            int md_nseg;

                                       /* total size (bytes):    8 */
                                   };
/*     88      |       4 */        union {
/*                     4 */            uint32_t md_offset;

                                       /* total size (bytes):    4 */
                                   };
/*     92      |       4 */        uint32_t md_type;

                                   /* total size (bytes):   24 */
                               } payload;
/*     96      |       8 */    nvme_cb_fn_t cb_fn;
/*    104      |       8 */    void *cb_arg;
/*    112      |       4 */    int32_t retries;
/*    116      |       1 */    bool payload_valid;
/*    117      |       1 */    bool timeout;
/*    118      |       2 */    bool spare[2];
/*    120      |       8 */    struct {
/*    120      |       8 */        struct nvme_request *stqe_next;

                                   /* total size (bytes):    8 */
                               } stailq;

                               /* total size (bytes):  128 */
                             }

(I would probably put the uint16_t before the two existing bool fields, so that the fields are somewhat sorted by size, but that doesn't really matter)

In D54714#1250118, @jhb wrote:

(I would probably put the uint16_t before the two existing bool fields, so that the fields are somewhat sorted by size, but that doesn't really matter)

I think I'll make retries int16_t at the same time. There's no need for more than even 10 retries most of the time, and more than 100 or 1000 is crazypants.

*but* it needs to be uint16_t for ioq and we need to snag one of those bools for 'use default ioq' so we can encode the whole range. Though maybe I'm worrying bogusly over an edge case. Yea, there is up to 2^16 (zero based) slots, so 0xffff is a valid slot number, but maybe I just document it because the largest number of submission queues I've seen in the wild is 512

Tweaks from john: sizing variables and noting we can only do up to 2^16-1
queues, not the full 2^16. That's sane for the far forseeable future.

Yes, the queue field is only 16 bits and queue 0 is always reserved for the admin queue, so I/O queues are 1...65535 meaning num_io_queues will always be <= 65535, so the ioq range for the new function can only be 0..65534.

This revision is now accepted and ready to land.Thu, Jan 15, 1:49 AM