iflib: Fix handling of mbuf cluster allocation failures.
ClosedPublic
Actions

Authored by markj on Jun 27 2020, 3:36 PM.

Details

Reviewers

shurd
• hselasky
gallatin

Group Reviewers

iflib

Commits

rS362962: iflib: Fix handling of mbuf cluster allocation failures.

Summary

- When refilling an rx freelist, make sure we only update the hw
producer index if at least one cluster was allocated. Also make sure
that we don't update the fragment index cursor if the last allocation
attempt didn't succeed. For Intel drivers, iflib basically assumes
that the consumer index and fragment index cursor stay in lock step,
but this assumption was getting violated, resulting in use-after-frees
and NULL pointer dereferences.

Test Plan

Peter Holm was reporting occasional mbuf cluster use-after-frees
that were tracked back to these bugs. He verified that they are no
longer reproducible with this patch.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 32012
Build 29546: arc lint + arc unit

Event Timeline

markj created this revision.Jun 27 2020, 3:36 PM

Herald added a reviewer: shurd. · View Herald TranscriptJun 27 2020, 3:36 PM

Herald added a reviewer: iflib. · View Herald Transcript

Herald added subscribers: melifaro, ae. · View Herald Transcript

markj requested review of this revision.Jun 27 2020, 3:36 PM

Harbormaster completed remote builds in B32012: Diff 73772.Jun 27 2020, 3:36 PM

markj edited the test plan for this revision. (Show Details)Jun 27 2020, 3:45 PM

markj edited the test plan for this revision. (Show Details)

Don't attempt to handle m_gethdr() failures. Once clusters are placed
in the ring, we don't need to update the bitmap right away, so there's
no harm in simply leaving it there.

Harbormaster completed remote builds in B32025: Diff 73817.Jun 28 2020, 7:27 PM

markj edited the summary of this revision. (Show Details)Jun 28 2020, 7:28 PM

markj edited the test plan for this revision. (Show Details)Jun 30 2020, 3:15 PM

gallatin added reviewers: • hselasky, gallatin.Jun 30 2020, 4:58 PM

I added Hans, as he's also got an attempt to fix the same problem (https://reviews.freebsd.org/D24236)

sys/net/iflib.c
1987–1988	while you're adding __predict_false(), this seems like a good place for one..

This revision is now accepted and ready to land.Jun 30 2020, 5:01 PM

Hi,

I think a custom test to provoke the failure and make sure it is really gone is required. I simply added a sysctl which triggered allocation failures at a random point in the loop you are patching.

--HPS

In D25489#564279, @hselasky wrote:

Hi,

I think a custom test to provoke the failure and make sure it is really gone is required. I simply added a sysctl which triggered allocation failures at a random point in the loop you are patching.

--HPS

@pho has been able to trigger this bug reliably with stress2 since we discovered that it was triggered by cluster allocation failures. I used fail(9) when working on the patch but do not want to leave it in since this is a performance-sensitive path.