Page MenuHomeFreeBSD

Fix iflib freelist state corruption
ClosedPublic

Authored by pkelsey on Mar 3 2020, 3:36 PM.
Tags
None
Referenced Files
F133298647: D23943.id69513.diff
Fri, Oct 24, 6:14 PM
Unknown Object (File)
Wed, Oct 22, 11:19 PM
Unknown Object (File)
Wed, Oct 22, 3:18 AM
Unknown Object (File)
Mon, Oct 13, 9:33 PM
Unknown Object (File)
Sun, Oct 12, 11:36 AM
Unknown Object (File)
Tue, Oct 7, 12:25 AM
Unknown Object (File)
Thu, Oct 2, 8:06 AM
Unknown Object (File)
Thu, Oct 2, 6:44 AM
Subscribers

Details

Summary

The very short summary is that the existing iflib freelist management contains a bug that breaks the correspondence between freelist indexes and driver ring slots, which is a design assumption on which both iflib and all iflib driver implementations depend. This affects all iflib drivers, although the severity and visibility of the effects vary depending on the traffic load and the driver implementation details.

This issue is part of the root cause of vmx driver issues reported in Bug 243126, Bug 243392, and Bug 240608.

The much longer (but still not comprehensive) discussion is:

Some drivers interact with the iflib freelists as simply a pool of
buffers, and use the iflib-supplied producer index to track the point
in the driver's descriptor ring at which a given refill operation
begins, and the iflib-supplied consumer index to report the buffer
indexes used by received packets. iflib then uses the driver-supplied
buffer index to mark availability in the freelist prior to issuing the
next refill. Such drivers don't deal directly with the freelist
buffer indexes, and rely only on the iflib-supplied producer and
consumer indexes in order to locate the origin of operations in their
rings. iflib relies on the ability to properly track its freelist
buffer indexes via the producer indexes it supplies to the driver
during refill and the consumer indexes reported by the driver during
packet reception. The underlying assumption is that the the buffer
indexes in the iflib freelist remain in a fixed correspondence with
the driver's ring slots.

In addition to the freelist producer index, iflib also maintains a
'frag_idx' parameter which tracks the freelist index to start the next
search for an available buffer at during the next refill operation.

A problem arises from the combination of the following:

  • At the end of a refill operation, the frag_idx parameter is left pointing to the last freelist index allocated during that refill.
  • Freelist buffers are marked available during the batch receive operation, which is then followed by a refill operation.

All of the above admits the following scenario:

  1. A receive ring is in its initial state.
  1. The receive ring is initialized with a limited refill (e.g., 16 slots out of 32 total rings slots are filled by the refill). The freelist frag_idx is left pointing at index 15, the last ring slot to be refilled, and the freelist producer index is left point at slot 16.
  1. Packets are received that consume the buffers in receive ring slots [0, 15].
  1. The iflib batch-receive operation processes those ring slots (provided to iflib as 'fragments'), removing the buffers (to be passed to the stack attached to mbufs), and marking the corresponding indexes in the free list as available for allocation. The indexes in the freelist that are marked available are identified by the index values returned by the driver, which in turn are based on the consumer index value that iflib supplied with the receive operation. As this is the initial receive, iflib supplies the driver a consumer index value of 0, and the driver thus marks the fragments with indexes [0,15]. At the end of the receive operation, the freelist consumer index will have advanced to 16.
  1. A refill operation is then performed, which begins the search at frag_idx 15, which will be found to be available for allocation, having just been marked as such during the receive operation above. For simplicity, let's assume that two buffers are provided by the refill. The first buffer (freelist index 15) will be provided to the driver's refill routine at producer index 16, and the driver will thus make it available to the hardware via its ring slot 16. The second buffer (freelist index 16) will then go in driver ring slot 17. The freelist frag_idx is now 16.
  1. One more packet now arrives and the hardware places it in the buffer pointed to by the driver's ring slot 16.
  1. In response to the RX interrupt, iflib performs a batch receive operation during which the driver returns this single packet/fragment (from driver ring slot 16) at the index that iflib supplied for the receive operation, which is the current freelist consumer index of 16. iflib then arranges for the buffer at freelist index 16 to be sent to the stack attached to an mbuf, and marks index 16 in the freelist as available for allocation. Recall, however, that the buffer that was put into driver ring slot 16 in (5) was the buffer at freelist index 15. iflib has just taken a buffer that is still owned by the hardware (at driver ring slot 17) and passed it into the stack, then marked its freelist entry available for allocation. The freelist state is now out of sync with the driver ring state, a packet has been lost, and a buffer that the hardware might write to at any point is being processed by the stack. The buffer that the packet was actually received into (freelist index 15) does not get processed or marked available for allocation.

One driver that is subject to the above scenario is the ixl driver.

This patch fixes the issue by advancing the freelist frag_idx at the
end of a refill operation to one past the last index allocated during
that refill. This ensures that the refill operations always advance
through the ring and correspondence between freelist buffer indexes
and driver ring slots is maintained.

Note that it is not required to update the logic in the refill loop to
maintain this 'one past' state behavior because the loop is marking
freelist indexes as in-use as it goes. While changing the
loop-internal logic to one-past would save the pointless examination
of a single bitmap bit on each loop pass, it would also require adding
code to handle wrap-around for frag_idx (which is already handled via
the bitmap routines), and has been judged to be not worth it.

This coherence between freelist buffer indexes and driver ring slots
is also essential for drivers that do not use the iflib-provided
producer and consumer indexes, but deal directly with the freelist
buffer indexes that iflib supplies as part of a refill operation. An
example of such a driver is the vmx driver, which operates directly in
terms of the refill buffer indexes in order to efficiently detect and
handle the case where the device skips descriptors in the driver ring.

Test Plan

This has been tested on a 12.1 system (with r356932 applied) with em and vmx devices.

This builds for 13-current, but testing on 13-current is waiting for the package system to get fixed.

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

We came to the same conclusion at Panzura and used a similar fix.

This revision is now accepted and ready to land.Mar 3 2020, 6:24 PM

One driver that is subject to the above scenario is the ixl driver.

Does that mean you also tested this patch with the ixl driver? I didn't see it mentioned in the "Test Plan" notes.

In D23943#526402, @erj wrote:

One driver that is subject to the above scenario is the ixl driver.

Does that mean you also tested this patch with the ixl driver? I didn't see it mentioned in the "Test Plan" notes.

I have not tested it with the ixl driver. It was noted in the description as an example of an iflib driver that is affected by the specific failure scenario that is presented because it happens to be top-of-the-stack for me for Intel driver code to refer to. All of the Intel drivers interact with the iflib freelists using the iflib-supplied producer and consumer indexes (and for such drivers, there's only one way to do it), so I consider showing that there's no surprises from this patch for em as providing high confidence that there are no surprises for igb, or ixgbe, or ixl...

This revision was automatically updated to reflect the committed changes.