Page MenuHomeFreeBSD

cam: use small CCBs for SCSI and ATA IO
ClosedPublic

Authored by trasz on Feb 15 2021, 12:31 AM.

Details

Summary

This is a working prototype, or perhaps a Request For Comments. It's implemented for da(4) and ada(4), but should be applicable for other protocols and periphs.

The end result is that CCBs issued via da(4) take ~512B (size of ccb_scsiio) instead of the usual 2kB (size of union ccb, ~1.5kB, rounded up by malloc(9)). We waste less memory, we avoid zeroing the unused 1kB, and it should be easier to allocate those CCBs in low memory conditions.

Note that this does not change the size, or the layout, of CCBs as such. CCBs get allocated in various different ways, in particular on the stack, and I don't want to redo all that. Instead, this provides an opt-in mechanism for the periph to declare "my start() callback is fine with receiving a CCB allocated from this UMA zone", and makes dastart(9) use it. In other words, most of the code works exactly as it used to; the change only happens to IOs issued by xpt_run_allockq(), which is - conveniently - pretty much all that matters for performance. In case of dastart(), the routine only ever casts the received CCB pointer to ccb_scsiio, so it doesn't require any special changes to make it work; I believe most periphs follow this pattern.

The reason for doing it this way is that it's pretty small, localized change, and can be implemented gradually and iteratively: take a periph, make sure its start() callback only casts the CCBs it takes to a particular type of CCB, add UMA zone for that size, and declare it safe to XPT. Because it's UMA, there's no alignment overhead, and it makes it possible to use uma_zone_reserve(9) to improve behaviour in low memory conditions even further.

I've considered making the UMA zone internal to periphs, and making xpt_run_allocq() pass NULL to the start() routine instead. The reason I hadn't done that is that I don't understand the interaction between xpt_run_allocq(), priorities, and requeueing to the ccb_list.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sys/cam/cam_ccb.h
737

changing this structure requires, iirc, compatibility code since this is one of the ABI interfaces cam is trying to maintain.

Btw, all the memset(3) calls are to make xpt_assert_ccb_is_large() work; they are not required for the "core change" to work.

sys/cam/cam_ccb.h
737

Thanks. Those changes are to get its size down to 512 bytes; I think the layout change, and compat shims, could go in later. I'm most interested in the general approach; if it's good, I could then proceed with doing the same for ada(4) and nda(4).

sys/cam/cam_ccb.h
737

On further consideration, perhaps the periph drivers should simply provide a callback to allocate a CCB, instead of indicating its size? This way we wouldn't need to touch ccb_scsiio layout, and would also avoid even more overhead for other transports (ccb_nvmeio is ~300 bytes).

Second attempt. This one uses periph-provided UMA zone, and thus doesn't
need to change the CCB layout.

sys/cam/cam_periph.h
152

The concept of per-periph CCB type/size looks weird to me. What type would you use for pass periph for ATAPI drive, supporting ATA and SCSI commands?

Practically speaking I probably wouldn't touch that periph - the functionality is opt-in, and makes sense for periphs that do lot of IO - but it's a good question. How does it work now? Looking at scsi/scsi_cd.c, it only uses ccb_scsiio, how does it work with ATA?

sys/cam/cam_ccb.h
737

The main need for compat is with camcontrol and other userland programs doing passthru until they can be recompiled.

As for callbacks, how do you see that working?

sys/cam/cam_xpt.c
4697

Don't we have a flag we can more directly tag the allocation so the free is unambiguous rather than guessing?

oops, looks like a stale comment got added. now that the layout is the same, no need to comment on it.

sys/cam/cam_xpt.c
4697

We do have the ccb_h.flags, but it gets overwritten by xpt_setup_ccb(). Also, explicitly setting the len helps in debugging. But yeah, would be nice to have a better mechanism for this.

trasz edited the summary of this revision. (Show Details)
  • Add kern.cam.da.enable_small_ccbs sysctl.
  • More assertions.
  • Convert ada(4) to use small CCBs too.
trasz retitled this revision from cam: use small CCBs for SCSI IO to cam: use small CCBs for SCSI and ATA IO.Mar 29 2021, 3:27 PM
trasz edited the summary of this revision. (Show Details)
sys/cam/cam_xpt.c
4698

Again, I'd prefer we have a flag that tells us it was allocated from UMA rather than malloc. This breaks if we're not using small allocations per the sysctl you added in this review iteration.

sys/cam/cam_xpt.c
4698

I've noticed some drivers overwrite the flags:

dev/ahci/ahci.c: ccb->ccb_h.flags = CAM_DIR_IN

How about a separate flag just for the backing storage, like in the updated version?

Some renaming, mostly replacing 'small' with UMA; separate alloc_flags field;
cleaned up assertions; change the default to disabled.

saw a couple of things, will comment again tomorrow or Wednesday once I'm thinking straight again

sys/cam/ata/ata_xpt.c
1798

I'd test this for the CAM_CCB_FROM_UMA bit being clear directly here, or you'll play whack-a-mole.

sys/cam/cam_ccb.h
350–356

This is only backwards compatible on little endian machines.

sys/cam/cam_ccb.h
350–356

but a ifdef to order the fields this way for LE and the opposite way for BE wouldn't be bad.

I've looked over this again... I like it, apart from the two quibbles to date: binary compat for both big/little endian (just a simple ifdef I think in the header) and testing for the flag bit == 0 rather than the whole word (though it could go in w/o it, I think it may be confusing in the future if it does and someone else adds a flag)...
Thanks again for working on this

Other than the weird ada race, I think we're good. We have enough other races that I'm nervous about allowing another known one if if we can avoid it.

sys/cam/ata/ata_da.c
1868

Can you get a traceback of where it's called from before adainit()? I'd think it was only enterable after atainit() completes, but maybe the 'master callback' races this and we should do all the init prior to establishing that?

Fix initialization of CCB zone for ada(4).

trasz added inline comments.
sys/cam/ata/ata_da.c
1868

It is; the problem was, adainit was called before adaregister, as expected, but it didn't finish until after adaregister() was already running. Moving the UMA zone initialization to run earlier fixes the problem.

This is looking really good. I'm down to complaining about KASSERTS because that's all I see left :)

sys/cam/ata/ata_da.c
1869

You only need to kassert this when ada_enable_uma_ccbs is true.

sys/cam/scsi/scsi_da.c
2863

why no similar KASSERT here to the ada case?

Looks good! Thanks for all the changes.

This revision is now accepted and ready to land.May 13 2021, 1:56 PM