e1000: Fix races in the AIM implementation
ClosedPublic
Actions

Authored by markj on Mon, May 19, 1:27 PM.

Details

Reviewers

erj
kbowling

Group Reviewers

Restricted Owners Package

(Owns No Changed Paths)

Commits

rG731c145612dd: e1000: Fix some issues in em_newitr()
rGe0267657f396: e1000: Initialize helper variables in em_newitr()

Summary

- Make sure to load per-ring packet and byte counts using atomic(9), as
  they might be concurrently modified by transmit/receive tasks, and
  there is no mutex synchronizing them with the interrupt handler.
- Rename a variable so that its value is more clear.
- Make sure that bytes and bytes_per_packet are always initialized.
  It's possible to have tx_bytes != 0 && tx_packets == 0, or rx_bytes !=
  0 && rx_bytes == 0, due to the race mentioned above, in which case the
  state machine would test an uninitialized variable.

PR: 286819

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 64289
Build 61173: arc lint + arc unit

Event Timeline

markj created this revision.Mon, May 19, 1:27 PM

Owners added a reviewer: Restricted Owners Package.Mon, May 19, 1:27 PM

Herald added a subscriber: imp. · View Herald TranscriptMon, May 19, 1:27 PM

markj requested review of this revision.Mon, May 19, 1:27 PM

Harbormaster completed remote builds in B64289: Diff 155694.Mon, May 19, 1:27 PM

kbowling added inline comments.Tue, May 20, 2:00 AM

sys/dev/e1000/if_em.c
1652	I would like to understand how you are seeing or detecting the race? We should have interrupts disabled in the legacy case. For msi-x, I thought iflib modulated the queue through the IFDI. What if we move the em_new_itr() call and /* Reset state */ stuff out of the em_msix_que() and into xxx_queue_intr_enable() before rearming IMS/EMIS?

kbowling added inline comments.Tue, May 20, 4:07 AM

sys/dev/e1000/if_em.c
1652	Linux has a slightly different organization where the msi-x routine writes the previous lagging ITR, so the msi-x routine is only concerned with two values on the queue: itr_val and set_itr (a flag). The value that ends up in itr_val would be done by the equivalent of our interrupt filter. I don't know that this is better than the above suggestion (for one, why lag by one period?). The meta point is I would like to avoid the long atomics or it starts eroding the benefits this is supposed to have. I think relying on the interrupt modulation for mutual exclusion and a compiler barrier for data coherence should be feasible, if you want to try to arrange it a particular way go ahead or if you want me to try, I'd just need to know how you are tracking the race so I know if I've fixed it.

markj added inline comments.Tue, May 20, 1:23 PM

sys/dev/e1000/if_em.c
1652	atomic_load_long() has no special cost--it's a plain memory access. Its use here is to give read-once semantics to the access and signal to the reader that there might be some concurrent writes to that memory. I claim that there's a race just be looking at where those fields are written. For instance, at the end of em_isc_txd_encap(), which is called in a threaded context, we have: /* Sent data accounting for AIM */ txr->tx_bytes += pi->ipi_len; ++txr->tx_packets; so what happens if those two fields are initially zero and this handler interrupts the thread before the increment of `txr->tx_packets`? The condition `txr->tx_bytes && txr->tx_packets` in the old code will be false, and `bytes` may be left uninitialized. In particular, the most important line in the diff is the addition of the assignment `bytes = bytes_per_packet = 0;`. I don't think the new version of the code will be any more expensive, the rest of the diff just tries to clarify what's going on (at least to my taste). According to the bug report, it should be possible to reproduce this simply by booting a GENERIC-KMSAN kernel. If you'd like to solve this a different way, please go ahead. The bug report came up because I asked the QAT driver developers to test with KMSAN and they hit this unrelated bug, so I wanted to unblock them.

@markj go ahead and commit the initialization and style changes. I need a moment to think about the threading, so I would ask that goes in a separate commit, which you are free to do as well. Is there any reason not to use _int variant over long? This should MFC back to 32 bit platforms which seem to generate different code vs x86 for whatever it's worth. The driver is also "upstream" for RTEMS and other smaller systems.

Also, will you do the same for igc(4)?

In D50416#1151089, @kbowling wrote:

@markj go ahead and commit the initialization and style changes. I need a moment to think about the threading, so I would ask that goes in a separate commit, which you are free to do as well. Is there any reason not to use _int variant over long? This should MFC back to 32 bit platforms which seem to generate different code vs x86 for whatever it's worth. The driver is also "upstream" for RTEMS and other smaller systems.

I used "long" because that's the underlying type for those fields, and there's no clear reason to me to truncate them to 32 bits on all platforms.

Note that on 32-bit platforms, "int" and "long" are identical.

I'm happy to hold off on committing the change if you'd like more time.