Page MenuHomeFreeBSD

iSCSI: Add support for segmentation offload for hardware offloads.
ClosedPublic

Authored by jhb on Jul 19 2021, 10:21 PM.

Details

Summary

Similar to TSO, iSCSI segmentation offload permits the upper layers to
submit a "large" virtual PDU which is split up into multiple segments
(PDUs) on the wire. Similar to how the TCP/IP headers are used as
templates for TSO, the BHS at the start of a large PDU is used as a
template to construct the specific BHS at the start of each PDU. In
particular, the DataSN is incremented for each subsequent PDU, and the
'F' flag is only set on the last PDU.

struct icl_conn has a new 'ic_hw_isomax' field which defaults to 0,
but can be set to the largest virtual PDU a backend supports. If this
value is non-zero, the iSCSI target and initiator use this size
instead of 'ic_max_send_data_segment_length' to determine the maximum
size for SCSI Data-In and SCSI Data-Out PDUs. Note that since PDUs
can be constructed from multiple buffers before being dispatched, the
target and initiator must wait for the PDU to be fully constructed
before determining the number of DataSN values were consumed (and thus
updating the per-transfer DataSN value used for the start of the next
PDU).

The target generates large PDUs for SCSI Data-In PDUs in
cfiscsi_datamove_in(). The initiator generates large PDUs for SCSI
Data-Out PDUs generated in response to an R2T.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 40537
Build 37426: arc lint + arc unit

Event Timeline

jhb requested review of this revision.Jul 19 2021, 10:21 PM

It looks good to me. I was thinking about possibility of something like this, just thought how to get back number of PDUs sent by the driver. But if we assume sent PDUs are always of ic_max_send_data_segment_length bytes it sure simplifies the logic. But I am missing here the "the target and initiator must wait for the PDU to be fully constructed before determining the number of DataSN values were consumed" part. Is there some cases when ip_data_len may get different from sum of len values passed to icl_pdu_append_data()? Or driver can report the segmentation result somehow else?

In D31222#707529, @mav wrote:

It looks good to me. I was thinking about possibility of something like this, just thought how to get back number of PDUs sent by the driver. But if we assume sent PDUs are always of ic_max_send_data_segment_length bytes it sure simplifies the logic. But I am missing here the "the target and initiator must wait for the PDU to be fully constructed before determining the number of DataSN values were consumed" part. Is there some cases when ip_data_len may get different from sum of len values passed to icl_pdu_append_data()? Or driver can report the segmentation result somehow else?

So I thought about having an explicit MSS value like we do for TCP, but for that we'd need to add a new field to 'struct icl_pdu' perhaps. However, it would always be set to 'ic_max_send_data_segment_length' anyway, so in the driver I just assume that the initial PDUs are all 'ic_max_send_data_segment_length' long and the last PDU holds the rest. If you'd prefer it be more explicit I can add a new field to 'struct icl_pdu' that gets set instead, something like 'ip_segsz' (which is close I think to the name of the field in 'struct mbuf' used for TSO).

sys/cam/ctl/ctl_frontend_iscsi.c
2623

I meant this change as the "initiator and target must wait to update DataSN". Here we bump the DataSN value by how many PDUs are used once it is fully constructed instead of assuming that each 'struct icl_pdu' consumes a single DataSN when it first created as the old code did.

sys/dev/iscsi/iscsi.c
1253

Similarly here for the initiator.

In D31222#707794, @jhb wrote:

If you'd prefer it be more explicit I can add a new field to 'struct icl_pdu' that gets set instead, something like 'ip_segsz' (which is close I think to the name of the field in 'struct mbuf' used for TSO).

I am not sure what additional information about PDU size (MSS) initiator or target can have, that is not already available to the driver via connection parameters. I was thinking the other way actually, when driver may have its own preferences for segmentation if the announced maximum size is not the most efficient somehow in specific conditions. But may be it is too flexible if later we decide to implement re-transmission or somehow else need to know what data were in what PDU.

In D31222#707798, @mav wrote:
In D31222#707794, @jhb wrote:

If you'd prefer it be more explicit I can add a new field to 'struct icl_pdu' that gets set instead, something like 'ip_segsz' (which is close I think to the name of the field in 'struct mbuf' used for TSO).

I am not sure what additional information about PDU size (MSS) initiator or target can have, that is not already available to the driver via connection parameters. I was thinking the other way actually, when driver may have its own preferences for segmentation if the announced maximum size is not the most efficient somehow in specific conditions. But may be it is too flexible if later we decide to implement re-transmission or somehow else need to know what data were in what PDU.

Hmmm, TCP has some notion of this for TSO due to working around quirks in broken Intel NICs, but I'd rather avoid those until a use case comes up that actually needs this. cxgbei(4) is happy to use the negotiated Max DSL already as the PDU size, so I think it's simplest to just stick with that. In terms of the driver communicating preferences, cxgbei already does this today in the form of setting the limits on the Max Send/Recv DSL via the idl limits callback.

So to be clear, @mav are you ok with this change as-is given our discussion?

This revision is now accepted and ready to land.Aug 6 2021, 12:37 AM