Page MenuHomeFreeBSD

iscsi: Teach the iSCSI stack about "large" PDUs.
ClosedPublic

Authored by jhb on Aug 17 2021, 12:32 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 18, 4:14 AM
Unknown Object (File)
Sat, Jan 18, 3:30 AM
Unknown Object (File)
Fri, Jan 17, 4:08 AM
Unknown Object (File)
Thu, Jan 16, 4:48 AM
Unknown Object (File)
Thu, Jan 16, 4:43 AM
Unknown Object (File)
Thu, Jan 16, 4:40 AM
Unknown Object (File)
Thu, Jan 16, 1:17 AM
Unknown Object (File)
Nov 27 2024, 3:26 PM
Subscribers

Details

Summary

When using iSCSI PDU offload (cxgbei) on T6 adapters, a burst of
received PDUs can be reported via a single message to the driver.

Previously the driver passed these multi-PDU bursts up to the iSCSI
stack up as a single "large" PDU by rewriting the buffer offset, data
segment length, and DataSN fields in the iSCSI header. While this
worked, the forged DataSN values did not match the ExpDataSN value in
the subsequent SCSI Response PDU. The initiator does not currently
verify this value, but the forged DataSN values prevent adding a
check.

To avoid this, allow a logical iSCSI PDU (struct icl_pdu) to describe
a burst of PDUs via a new 'ip_additional_pdus' field. Normally this
field is set to zero when 'struct icl_pdu' represents a single PDU.
If logical PDU represents a burst of on-the-wire PDUs, then 'ip_npdus'
contains the count of additional on-the-wire PDUs. The header of this
"large" PDU is still modified, but the DataSN field now contains the
DataSN value of the first on-the-wire PDU in the burst.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 41107
Build 37996: arc lint + arc unit

Event Timeline

jhb requested review of this revision.Aug 17 2021, 12:32 AM

This is still a bit of a RFC / WIP as I have some questions.

  1. The new field could perhaps be a count of "additional PDUs" (so for a burst of 4 PDUs it would be 3 instead of 4). This would simplify the logic in cfiscsi_pdu_handle_data_out().
  1. The DataSN field in the BHS could be the last DataSN in the burst rather than the first. This more closely matches what the T6 does, but it seems really odd to me as it would make the BHS DataSN and data segment length not be in sync.
  1. Aside from the DataSN check in cfiscsi_pdu_handle_data_out(), I couldn't find any other places that check the DataSN field. In particular, I couldn't find any places that would use the received DataSN value in a reply message (which was the reason @mav asked for me to fix this). Perhaps I missed something? (Note that I didn't find anything in the initiator that checked the DataSN values btw as my testing version of this originally left the DataSN as the last DataSN instead of the first to help catch places that were checking for mismatches and the initiator didn't trip any checks when run in this mode.)
  1. "additional PDUs", i.e. "3" sounds better for me from compatibility point.
  2. I am not getting "the BHS DataSN and data segment length not be in sync" meaning. I have no strong opinion on this point. I can propose arguments for either way.
  3. Since we do not implement error recovery, any PDU corruption should kill the connection, so there should be no lost PDUs, that is why I suppose nobody bothered with the check. But yes, initiator should better check the DataSN in incoming Data-In PDUs and ExpDataSN in SCSI Response PDU.
In D31577#711861, @mav wrote:
  1. "additional PDUs", i.e. "3" sounds better for me from compatibility point.

Ok. I will probably rename it then to something like 'ip_additional_pdus' or 'ip_extra_pdus'.

  1. I am not getting "the BHS DataSN and data segment length not be in sync" meaning. I have no strong opinion on this point. I can propose arguments for either way.

For example, suppose we get a burst of 4 PDUs, each 4k long. The original BHS I get in the driver is for the last PDU (so buffer offset of 12k, length of 4k, and DataSN of 3). Right now that BHS gets rewritten in the driver to have a buffer offset of 0, length of 16k, and DataSN of 0 which is what you would get if you happened to receive a single 16K PDU on the wire instead. What 2) is asking is if I were to leave the DataSN value alone (so 3 instead of 0). I would still have to rewrite the buffer offset and data segment length though, so you'd end up with a BHS that has buffer offset of 0, data segment length of 16k, but a DataSN of 3. That I think looks rather weird and would have the DataSN "out of sync" with the other fields.

Another alternative perhaps would be to leave the BHS unchanged entirely and instead add another field (ip_additional_data or the like) that was a count of the data received in the prior PDUs. For the example, that would mean you'd have the existing BHS unmodified (buffet offset of 12k, length of 4k, DataSN of 3) with 'ip_additional_pdus' set to 3, and 'ip_additional_data' set to 12k. I worry however that dealing with 'ip_additional_data' would be much more invasive and that is is probably simpler to have the driver rewrite the headers to look like one large PDU instead.

  1. Since we do not implement error recovery, any PDU corruption should kill the connection, so there should be no lost PDUs, that is why I suppose nobody bothered with the check. But yes, initiator should better check the DataSN in incoming Data-In PDUs and ExpDataSN in SCSI Response PDU.

Ahh, so you mean that the initiator should be checking the ExpDataSN in iscsi_pdu_handle_scsi_response() and verifying it matched the last received DataSN (or maybe DataSN + 1)?

Ah, and iscsi_pdu_handle_data_in() in the initiator has XXX comments about checking DataSN and the F flag. *sigh*

In D31577#712072, @jhb wrote:

Another alternative perhaps would be to leave the BHS unchanged entirely and instead add another field (ip_additional_data or the like) that was a count of the data received in the prior PDUs. For the example, that would mean you'd have the existing BHS unmodified (buffet offset of 12k, length of 4k, DataSN of 3) with 'ip_additional_pdus' set to 3, and 'ip_additional_data' set to 12k. I worry however that dealing with 'ip_additional_data' would be much more invasive and that is is probably simpler to have the driver rewrite the headers to look like one large PDU instead.

To correctly aggregate the PDUs hardware must know all about the iSCSI, so I am not sure passing only the last raw PDU would buy us much of potential future compatibility or something. I don't feel too bad about rewriting. From that perspective rewriting to the first DataSN value would indeed look a bit more consistent.

Ahh, so you mean that the initiator should be checking the ExpDataSN in iscsi_pdu_handle_scsi_response() and verifying it matched the last received DataSN (or maybe DataSN + 1)?

Yes, DataSN + 1. It is a protection against data underruns in case of last Data-In PDU(s) loss.

In D31577#712086, @mav wrote:
In D31577#712072, @jhb wrote:

Another alternative perhaps would be to leave the BHS unchanged entirely and instead add another field (ip_additional_data or the like) that was a count of the data received in the prior PDUs. For the example, that would mean you'd have the existing BHS unmodified (buffet offset of 12k, length of 4k, DataSN of 3) with 'ip_additional_pdus' set to 3, and 'ip_additional_data' set to 12k. I worry however that dealing with 'ip_additional_data' would be much more invasive and that is is probably simpler to have the driver rewrite the headers to look like one large PDU instead.

To correctly aggregate the PDUs hardware must know all about the iSCSI, so I am not sure passing only the last raw PDU would buy us much of potential future compatibility or something. I don't feel too bad about rewriting. From that perspective rewriting to the first DataSN value would indeed look a bit more consistent.

Ok.

Ahh, so you mean that the initiator should be checking the ExpDataSN in iscsi_pdu_handle_scsi_response() and verifying it matched the last received DataSN (or maybe DataSN + 1)?

Yes, DataSN + 1. It is a protection against data underruns in case of last Data-In PDU(s) loss.

I have a WIP patch for this that I will test and post for review once it works for me (both with icl_soft.c and cxgbei). I don't think I have a good way to test iSER. We should also probably be checking for 'F' which I might do as a second followup (ensuring we get one and exactly one 'F' at the end of the PDU sequence, but only if we got any Data-In PDUs, it looks like the target side does do this check already).

  • Replace ip_npdus with ip_additional_pdus.
In D31577#712088, @jhb wrote:

We should also probably be checking for 'F' which I might do as a second followup (ensuring we get one and exactly one 'F' at the end of the PDU sequence, but only if we got any Data-In PDUs, it looks like the target side does do this check already).

RFC 7143 tells about the F bit:

For incoming data, this bit is 1 for the last input (read) data PDU
of a sequence. Input can be split into several sequences, each
having its own F bit. Splitting the data stream into sequences does
not affect DataSN counting on Data-In PDUs.

...

DataSegmentLength MUST NOT exceed MaxRecvDataSegmentLength for the
direction it is sent, and the total of all the DataSegmentLength of
all PDUs in a sequence MUST NOT exceed MaxBurstLength (or
FirstBurstLength for unsolicited data).

So, the way I read it there can (and in some cases should) be many sequences and so F bits set per request. So you can probably only check that the last PDU has F bit set.

Actually, it seems cfiscsi_datamove_in() should set more F bits than it does now, since I don't see burst size used there.

This revision is now accepted and ready to land.Aug 17 2021, 10:49 PM