Page MenuHomeFreeBSD

e1000: Fix races in the AIM implementation
ClosedPublic

Authored by markj on Mon, May 19, 1:27 PM.
Tags
None
Referenced Files
F119739896: D50416.id155694.diff
Wed, Jun 11, 5:55 PM
Unknown Object (File)
Mon, Jun 9, 8:07 AM
Unknown Object (File)
Sun, Jun 8, 5:31 PM
Unknown Object (File)
Wed, Jun 4, 3:36 PM
Unknown Object (File)
Wed, May 28, 10:06 AM
Unknown Object (File)
Tue, May 27, 6:42 PM
Unknown Object (File)
Sat, May 24, 5:41 AM
Unknown Object (File)
Sat, May 24, 12:20 AM
Subscribers

Details

Summary
- Make sure to load per-ring packet and byte counts using atomic(9), as
  they might be concurrently modified by transmit/receive tasks, and
  there is no mutex synchronizing them with the interrupt handler.
- Rename a variable so that its value is more clear.
- Make sure that bytes and bytes_per_packet are always initialized.
  It's possible to have tx_bytes != 0 && tx_packets == 0, or rx_bytes !=
  0 && rx_bytes == 0, due to the race mentioned above, in which case the
  state machine would test an uninitialized variable.

PR: 286819

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 64289
Build 61173: arc lint + arc unit

Event Timeline

Owners added a reviewer: Restricted Owners Package.Mon, May 19, 1:27 PM
markj requested review of this revision.Mon, May 19, 1:27 PM
sys/dev/e1000/if_em.c
1652

I would like to understand how you are seeing or detecting the race?

We should have interrupts disabled in the legacy case.

For msi-x, I thought iflib modulated the queue through the IFDI.

What if we move the em_new_itr() call and /* Reset state */ stuff out of the em_msix_que() and into xxx_queue_intr_enable() before rearming IMS/EMIS?

sys/dev/e1000/if_em.c
1652

Linux has a slightly different organization where the msi-x routine writes the previous lagging ITR, so the msi-x routine is only concerned with two values on the queue: itr_val and set_itr (a flag). The value that ends up in itr_val would be done by the equivalent of our interrupt filter. I don't know that this is better than the above suggestion (for one, why lag by one period?).

The meta point is I would like to avoid the long atomics or it starts eroding the benefits this is supposed to have. I think relying on the interrupt modulation for mutual exclusion and a compiler barrier for data coherence should be feasible, if you want to try to arrange it a particular way go ahead or if you want me to try, I'd just need to know how you are tracking the race so I know if I've fixed it.

sys/dev/e1000/if_em.c
1652

atomic_load_long() has no special cost--it's a plain memory access. Its use here is to give read-once semantics to the access and signal to the reader that there might be some concurrent writes to that memory.

I claim that there's a race just be looking at where those fields are written. For instance, at the end of em_isc_txd_encap(), which is called in a threaded context, we have:

/* Sent data accounting for AIM */
txr->tx_bytes += pi->ipi_len;
++txr->tx_packets;

so what happens if those two fields are initially zero and this handler interrupts the thread before the increment of txr->tx_packets? The condition txr->tx_bytes && txr->tx_packets in the old code will be false, and bytes may be left uninitialized. In particular, the most important line in the diff is the addition of the assignment bytes = bytes_per_packet = 0;.

I don't think the new version of the code will be any more expensive, the rest of the diff just tries to clarify what's going on (at least to my taste). According to the bug report, it should be possible to reproduce this simply by booting a GENERIC-KMSAN kernel.

If you'd like to solve this a different way, please go ahead. The bug report came up because I asked the QAT driver developers to test with KMSAN and they hit this unrelated bug, so I wanted to unblock them.

@markj go ahead and commit the initialization and style changes. I need a moment to think about the threading, so I would ask that goes in a separate commit, which you are free to do as well. Is there any reason not to use _int variant over long? This should MFC back to 32 bit platforms which seem to generate different code vs x86 for whatever it's worth. The driver is also "upstream" for RTEMS and other smaller systems.

Also, will you do the same for igc(4)?

@markj go ahead and commit the initialization and style changes. I need a moment to think about the threading, so I would ask that goes in a separate commit, which you are free to do as well. Is there any reason not to use _int variant over long? This should MFC back to 32 bit platforms which seem to generate different code vs x86 for whatever it's worth. The driver is also "upstream" for RTEMS and other smaller systems.

I used "long" because that's the underlying type for those fields, and there's no clear reason to me to truncate them to 32 bits on all platforms.

Note that on 32-bit platforms, "int" and "long" are identical.

I'm happy to hold off on committing the change if you'd like more time.

Also, will you do the same for igc(4)?

I can, but I have no way to test the patch--would you be able to?

This revision was not accepted when it landed; it landed in state Needs Review.Tue, May 27, 5:54 PM
This revision was automatically updated to reflect the committed changes.