Page MenuHomeFreeBSD

iflib: Fix handling of mbuf cluster allocation failures.
ClosedPublic

Authored by markj on Jun 27 2020, 3:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Jan 17 2024, 8:46 AM
Unknown Object (File)
Dec 30 2023, 8:12 PM
Unknown Object (File)
Dec 25 2023, 1:17 PM
Unknown Object (File)
Dec 23 2023, 12:23 PM
Unknown Object (File)
Dec 17 2023, 4:03 AM
Unknown Object (File)
Dec 14 2023, 6:15 AM
Unknown Object (File)
Nov 2 2023, 10:50 PM
Unknown Object (File)
Sep 23 2023, 10:02 AM

Details

Summary


- When refilling an rx freelist, make sure we only update the hw
producer index if at least one cluster was allocated. Also make sure
that we don't update the fragment index cursor if the last allocation
attempt didn't succeed. For Intel drivers, iflib basically assumes
that the consumer index and fragment index cursor stay in lock step,
but this assumption was getting violated, resulting in use-after-frees
and NULL pointer dereferences.

Test Plan

Peter Holm was reporting occasional mbuf cluster use-after-frees
that were tracked back to these bugs. He verified that they are no
longer reproducible with this patch.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 32090
Build 29609: arc lint + arc unit

Event Timeline

markj requested review of this revision.Jun 27 2020, 3:36 PM
markj edited the test plan for this revision. (Show Details)

Don't attempt to handle m_gethdr() failures. Once clusters are placed
in the ring, we don't need to update the bitmap right away, so there's
no harm in simply leaving it there.

I added Hans, as he's also got an attempt to fix the same problem (https://reviews.freebsd.org/D24236)

sys/net/iflib.c
1986–1987

while you're adding __predict_false(), this seems like a good place for one..

This revision is now accepted and ready to land.Jun 30 2020, 5:01 PM

Hi,

I think a custom test to provoke the failure and make sure it is really gone is required. I simply added a sysctl which triggered allocation failures at a random point in the loop you are patching.

--HPS

Hi,

I think a custom test to provoke the failure and make sure it is really gone is required. I simply added a sysctl which triggered allocation failures at a random point in the loop you are patching.

--HPS

@pho has been able to trigger this bug reliably with stress2 since we discovered that it was triggered by cluster allocation failures. I used fail(9) when working on the patch but do not want to leave it in since this is a performance-sensitive path.

Sprinkle more __predict_false().

This revision now requires review to proceed.Jul 1 2020, 2:02 PM
markj marked an inline comment as done.Jul 1 2020, 2:02 PM

Any objection to committing this? It has been tested extensively by Peter.

No objections from me. I'll give it a spin at Mellanox once it hits the tree.

This revision is now accepted and ready to land.Jul 6 2020, 2:22 PM