Page MenuHomeFreeBSD

Fix ioat(4) hw reset race, and pull in two prior patches needed for the fix
ClosedPublic

Authored by cem on Jul 4 2016, 4:17 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 21, 9:18 AM
Unknown Object (File)
Oct 17 2024, 1:20 PM
Unknown Object (File)
Oct 5 2024, 8:24 PM
Unknown Object (File)
Oct 5 2024, 8:13 PM
Unknown Object (File)
Sep 21 2024, 2:08 AM
Unknown Object (File)
Sep 20 2024, 7:01 PM
Unknown Object (File)
Sep 16 2024, 11:39 PM
Unknown Object (File)
Sep 15 2024, 5:14 PM
Subscribers

Details

Summary

I have some other ioat(4) work to upstream that was waiting for 11 to branch,
but I don't think this fix should wait.

ioat(4): Split timer into poll and shrink functions

Poll should happen quickly, while shrink should happen infrequently.

Protect is_completion_pending with submit_lock.

Sponsored by: EMC / Isilon Storage Division

ioat(4): Serialize ioat_reset_hw invocations

Sponsored by: EMC / Isilon Storage Division

ioat(4): Block asynchronous work during HW reset

Fix the race between ioat_reset_hw and ioat_process_events.

HW reset isn't protected by a lock because it can sleep for a long time
(40.1 ms). This resulted in a race where we would process bogus parts
of the descriptor ring as if it had completed. This looked like
duplicate completions on old events, if your ring had looped at least
once.

Block callout and interrupt work while reset runs so the completion end
of things does not observe indeterminate state and process invalid parts
of the ring.

Start the channel with a manually implemented ioat_null() to keep other
submitters quiesced while we wait for the channel to start (100 us).

r295605 may have made the race between ioat_reset_hw and
ioat_process_events wider, but I believe it already existed before that
revision. ioat_process_events can be invoked by two asynchronous
sources: callout (softclock) and device interrupt. Those could race
each other, to the same effect.

Sponsored by: EMC / Isilon Storage Division

Test Plan
  1. 8 concurrent threads submitting ioat copy requests
  2. 1 thread running 'while true; do sysctl dev.ioat.0.hammer.force_hw_reset=1; done'

Before: Panics quickly (less than 2 seconds)
After: Doesn't panic immediately

Use an INVARIANTS kernel.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

cem retitled this revision from to Fix ioat(4) hw reset race, and pull in two prior patches needed for the fix.
cem updated this object.
cem edited the test plan for this revision. (Show Details)
cem added reviewers: bdrewery, markj.
sys/dev/ioat/ioat.c
381 ↗(On Diff #18127)

ioat_null() doesn't seem to be used anymore, FWIW.

673 ↗(On Diff #18127)

I don't really see why we have the resetting and resetting_cleanup flags. resetting_cleanup is only ever set when resetting is true, and the resetting and quiescing flags seem to be very similar.

1853 ↗(On Diff #18127)

I don't see anything that sleeps on this?

sys/dev/ioat/ioat.c
381 ↗(On Diff #18127)

It's a public KPI/KBI. I suppose it's pretty useless and we could probably yank it despite the KBI freeze.

389 ↗(On Diff #18127)

This should be ->head. The ring should be empty here and they should be identical, but semantically we submit at the head.

673 ↗(On Diff #18127)

I didn't want to overload quiescing more than it already is. Also, we need to quiesce submissions before we block cleanup. The two resetting variables are protected by different locks. It means we can avoid taking the head lock in the cleanup (tail) thread.

1853 ↗(On Diff #18127)

Nothing does. But a spurious wakeup call doesn't hurt anything and is sort of consistent with our other state flags.

I'm preparing and testing a few changes.

The spurious wakeup is dropped. The s/head/tail/ typo is fixed.

There is a race at the end of reset where the resetting_cleanup needs to be unblocked before the channel is started again, or it's possible to block future refcount drains because nothing completes that ring entry.

sys/dev/ioat/ioat.c
381 ↗(On Diff #18127)

Yeah. I wouldn't use the KBI freeze as a justification for keeping it, but it's up to you.

673 ↗(On Diff #18127)

Hm. I just noticed that ioat_process_events() is called from an interrupt filter, which means that it runs in hard interrupt context and thus isn't allowed to acquire regular mutexes at all - only spin mutexes.

I thought witness would catch that though, so I might be missing something.

1853 ↗(On Diff #18127)

Ok.

sys/dev/ioat/ioat.c
673 ↗(On Diff #18127)

Oh, never mind. I mixed the ithread and filter arguments to bus_setup_intr().

markj edited edge metadata.
This revision is now accepted and ready to land.Jul 5 2016, 6:10 PM
cem edited edge metadata.

Fix reset_hw <-> process_events race while starting channel after reset
finished. Allow process_events to resume before we queue the first work item,
but be sure to clear the cached completion status to match last_seen.

Drop spurious wakeup and fix head/tail typo.

I am leaving ioat_null in place; I believe it's documented in the manual page
and I don't want to mess with that now.

This revision now requires review to proceed.Jul 5 2016, 7:10 PM
cem marked 2 inline comments as done.Jul 5 2016, 7:10 PM
markj edited edge metadata.
This revision is now accepted and ready to land.Jul 5 2016, 7:20 PM
This revision was automatically updated to reflect the committed changes.