vmxnet3: skip zero-length descriptor in the middle of a packet
ClosedPublic
Actions

Authored by avg on Nov 30 2021, 1:56 PM.

Details

Reviewers

pkelsey
markj
gallatin

Group Reviewers

iflib

Commits

rG9c612a5d0af1: vmxnet3: skip zero-length descriptor in the middle of a packet

Summary

Passing up such descriptors to iflib is obviously wasteful.
But the main conern is that we may overrun iri_frags array because of
them. That's been observed in practice.

Also, assert that the number of fragments / descriptors / segments is
less than IFLIB_MAX_RX_SEGS.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

avg created this revision.Nov 30 2021, 1:56 PM

Herald added a subscriber: imp. · View Herald TranscriptNov 30 2021, 1:56 PM

avg requested review of this revision.Nov 30 2021, 1:56 PM

Harbormaster completed remote builds in B43051: Diff 99235.Nov 30 2021, 1:57 PM

avg added reviewers: pkelsey, markj, iflib.Nov 30 2021, 1:58 PM

avg added inline comments.

sys/dev/vmware/vmxnet3/if_vmx.c
1616	I am not sure if this really needs to be accounted separately, but I wanted additional visibility while testing the change. Also, not sure about the name...

gallatin accepted this revision.Nov 30 2021, 2:10 PM

gallatin added a subscriber: gallatin.

gallatin added inline comments.

sys/dev/vmware/vmxnet3/if_vmx.c
1607	I assume rxcd->len != 0 is by far the common case, so I'd annotate with __predict_true() to help the compiler (and the reader).

This revision is now accepted and ready to land.Nov 30 2021, 2:10 PM

add __predict_true for the normal case

This revision now requires review to proceed.Nov 30 2021, 2:17 PM

Harbormaster completed remote builds in B43053: Diff 99237.Nov 30 2021, 2:17 PM

avg marked an inline comment as done.Nov 30 2021, 2:18 PM

avg added inline comments.

sys/dev/vmware/vmxnet3/if_vmx.c
1607	Thank you for the suggestion!

gallatin accepted this revision.Nov 30 2021, 5:00 PM

This revision is now accepted and ready to land.Nov 30 2021, 5:00 PM

We can only guess as to what may be really going on in the virtual device, but this seems like a correct defensive approach to what has been observed. Given the data at https://people.freebsd.org/~avg/vmxnet3-fragment-overrun.txt (thanks for collecting this!), I was concerned that there might be an issue with the fact that the virtual device is consuming free list entries for the zero-length fragments, but we are hiding this information from iflib, and that might break free list maintenance. I reviewed those paths, and it's fine (iflib tracks and replaces packet mbufs sent to the stack via a bitmap, provides the corresponding free list indexes to the driver during refill, and during refill, the vmx driver correctly identifies and handles gaps in those provided indexes).

With respect to nrxsg_max, with this fix we are still hoping that the virtual device will not give us more than that many non-zero length fragments as part of a single packet (the one thing this episode seems to indicate is that the virtual device is willing to consume more than nrxsg_max free list entries for a single packet, which was not believed to be possible before). The additional KASSERT now checks this for INVARIANTS builds. If we were willing to pay the runtime cost of checking nfrags against IFLIB_MAX_RX_SEGS in non-INVARIANTS builds, that error condition could be handled by setting rxcd->error and breaking out of the loop before incrementing nfrags past IFLIB_MAX_RX_SEGS. Not sure it's worth it without first seeing proof of such a broken virtual device implementation.

I think the name for the new counter (and its corresponding sysctl), "vxcr_zero_length_after_sop", could be better as this name can be taken to exclude a zero length fragment that has sop set, but it does count those. Calling it vcxr_zero_length_frag might be better. Then we'd have vxcr_zero_length counting zero length packets and vcxr_zero_length_frag counting zero length fragments within non-zero length packets. The extent to which that's really really truly true does depend on some assumptions the driver still makes on the fragment structure of packets. Not going to hold up the fix over this.

There remain assumptions on the fragment structure of packets in this driver. vmxnet3_isc_rxd_available() essentially assumes that the queue will never fill with packets that end in a zero-length fragment (if this does happen, receive processing would stall), and vmxnet3_isc_rxd_pkt_get() effectively assumes that a packet that starts with a zero length fragment will consist of only one fragment.

@pkelsey , thank you very much for the very detailed review and feedback.
It will certainly be useful if another problem (or a variation of the current one) strikes us.

I thought a bit about gracefully handling too many (non-empty) fragments for a packet.
My only concern is what happens if we abort after seeing SOP and before reaching EOP.
Might that confuse the next call to vmxnet3_isc_rxd_available / vmxnet3_isc_rxd_pkt_get?

better name for the new counter and its sysctl

This revision now requires review to proceed.Dec 1 2021, 8:07 AM

Harbormaster completed remote builds in B43077: Diff 99278.Dec 1 2021, 8:07 AM

In D33189#750472, @avg wrote:

I thought a bit about gracefully handling too many (non-empty) fragments for a packet.
My only concern is what happens if we abort after seeing SOP and before reaching EOP.
Might that confuse the next call to vmxnet3_isc_rxd_available / vmxnet3_isc_rxd_pkt_get?

You are right, I overlooked the need to still advance cqidx to eop in that case (because the new packet detection code does not scan for sop - it assumes the next descriptor is always sop when it begins its work). I was focused on the free list management implications for fragments that had already been identified and recorded before IFLIB_MAX_RX_SEGS was reached as well as fragments that would not be recorded due to skipping them in an error handling path. Those observations I think are still sound - following the rxcd->error processing for fragments recorded to that point would be ok, as would skipping over any other fragments not yet recorded.

This revision is now accepted and ready to land.Dec 1 2021, 8:35 PM

Closed by commit rG9c612a5d0af1: vmxnet3: skip zero-length descriptor in the middle of a packet (authored by avg). · Explain WhyDec 6 2021, 8:01 AM

This revision was automatically updated to reflect the committed changes.

avg added a commit: rG9c612a5d0af1: vmxnet3: skip zero-length descriptor in the middle of a packet.