Page MenuHomeFreeBSD

iflib: drain enqueued tasks before detaching from taskqgroup
ClosedPublic

Authored by krzysztof.galazka_intel.com on Oct 4 2018, 12:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 30, 5:55 AM
Unknown Object (File)
Mon, Dec 30, 4:43 AM
Unknown Object (File)
Mon, Dec 30, 4:38 AM
Unknown Object (File)
Fri, Dec 20, 11:32 PM
Unknown Object (File)
Fri, Dec 20, 10:41 PM
Unknown Object (File)
Fri, Dec 20, 7:55 PM
Unknown Object (File)
Tue, Dec 17, 12:43 AM
Unknown Object (File)
Sun, Dec 15, 12:57 PM

Details

Summary

The taskqgroup_detach function does not check if task is already enqueued when detaching it. This may lead to kernel panic if enqueued task starts after context state lock is destroyed. Ensure that the already enqueued admin tasks are executed before detaching them.

The issue was discovered during validation of D16429. Unloading of if_ixlv followed by immediate removal of VFs with iovctl -D may lead to panic on NODEBUG kernel.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Is there a reason not to do this in taskqgroup_detach()?

I don't see any obvious thing which prevents something being enqueued between the gtaskqueue_drain() and taskqgroup_detach() calls.

Is there a reason not to do this in taskqgroup_detach()?

I don't see any obvious thing which prevents something being enqueued between the gtaskqueue_drain() and taskqgroup_detach() calls.

For the latter statement, should iflib_admin_intr_deferred() then add a check to see if iflib is in detach before calling GROUPTASK_ENQUEUE()? I don't see other places where it's enqueued.

In D16428, I added a check to _task_fn_admin() to see if iflib was in detach and then return, but that won't prevent STATE_LOCK() from potentially locking around a freed lock.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 5 2018, 8:16 PM
This revision was automatically updated to reflect the committed changes.

rS339207 has the wrong Differential Revision, I think

erj requested changes to this revision.Oct 5 2018, 8:31 PM

You're apparently not allowed to delete changesets, so this'll have to be updated with the old diff.

This revision now requires changes to proceed.Oct 5 2018, 8:31 PM

Bah, sorry... looks like I just grabbed whatever URL I had open at the time. :(

In D17404#371990, @erj wrote:

Is there a reason not to do this in taskqgroup_detach()?

I don't see any obvious thing which prevents something being enqueued between the gtaskqueue_drain() and taskqgroup_detach() calls.

For the latter statement, should iflib_admin_intr_deferred() then add a check to see if iflib is in detach before calling GROUPTASK_ENQUEUE()? I don't see other places where it's enqueued.

Wouldn't iflib_iov_intr_deferred() need it too?

Wouldn't iflib_iov_intr_deferred() need it too?

Yeah, it would need a check, too.

Hrm, I don't think we can obtain CTX_LOCK() in those functions.

Hrm, I don't think we can obtain CTX_LOCK() in those functions.

In D16428, I changed the in_detach field from a value modified under CTX_LOCK in detach to a flag under STATE_LOCK(), could those functions use STATE_LOCK() instead? That's a mutex vs an sx lock.

Is there a reason not to do this in taskqgroup_detach()?

I thought about doing it in the tackqgroup_detach but beside iflib the taskqgrups are used by linux_tasklet and I wasn't sure what impact it would have there.

I don't see any obvious thing which prevents something being enqueued between the gtaskqueue_drain() and taskqgroup_detach() calls.

At this point interrupts are disabled, timers are stopped and ifp is detached although it may be re-enqueued by itself. I think I could check igtask->gt_taskqueue in iflib_admin_intr_deferred - it's set to NULL in taskqgroup_detach. It would not require using CTX_LOCK.

Does not apply against lastest HEAD:

root@s254:/vcs/git/freebsd/head # git branch

  • master

root@s254:/vcs/git/freebsd/head # git pull origin
Already up to date.
root@s254:/vcs/git/freebsd/head # arc patch D17404
Created and checked out branch arcpatch-D17404.

This diff is against commit 339206, but the commit is nowhere in the
working copy. Try to apply it against the current working copy state?
(d5fe04e8d736359a48afb53ee8c440bc84a209d5) [Y/n] y

Checking patch head/sys/dev/e1000/igb_txrx.c...
error: head/sys/dev/e1000/igb_txrx.c: does not exist in index

Patch Failed!
Usage Exception: Unable to apply patch!

  • Calling gtaskqueue_drain moved to taskqgroup_detach
  • Added condition to iflib_admin_intr_deferred to prevent rescheduling task during detach

Adding dependency on D16428 didn't worked as planned :/ Changes are included in diff. I'll try to fix that later.

iflib_admin_intr_deferred() called with STATE_LOCK() in a few places.

sys/net/iflib.c
2266 ↗(On Diff #49007)

Called with STATE_LOCK()

2789 ↗(On Diff #49007)

Called with STATE_LOCK()

  • STATE_UNLOCK moved to address Stephen's comment.

iflib_admin_intr_deferred() called with STATE_LOCK() in a few places.

I've moved STATE_UNLOCK before iflib_admin_intr_defereed call although releasing lock just to immediately grab it again makes me a bit uneasy. Maybe it would be better idea to use mtx_owned in iflib_in_detach to check if this lock is already aquired?

iflib_admin_intr_deferred() called with STATE_LOCK() in a few places.

I've moved STATE_UNLOCK before iflib_admin_intr_defereed call although releasing lock just to immediately grab it again makes me a bit uneasy. Maybe it would be better idea to use mtx_owned in iflib_in_detach to check if this lock is already aquired?

Now that you bring up the back-to-back lock/unlock, I think that's worth considering. My qualm was that it doesn't seem like mtx_owned() is used for that kind of thing in very many places in the kernel, and iflib_in_detach() was initially meant to be just used outside of iflib in drivers, but I don't think checking for mtx_owned() and leaving the iflib_admin_intr_deferred() calls in place is actually technically wrong.

My opinion is that mtx_owned() would just make this all more complex. Releasing a lock for a short period isn't a problem at all.

The normal solution is to extract the bit in the lock and call it iflib_in_detach_locked() or whatever which is called from iflib_in_detach() inside the lock.

My opinion is that mtx_owned() would just make this all more complex. Releasing a lock for a short period isn't a problem at all.

In that case our validation team is going to test this version of the patch. I don't plan any more changes unless they discover some issues.

This patch fixes a kernel panic we are seeing when unloading iavf while passing traffic.

sys/net/iflib.c
5987 ↗(On Diff #49206)

Weren't we going to add the iflib_in_detach() check here as well? Did we decide it wasn't needed?

  • Add iflib_in_detach check to iflib_intr_iov_deferred as sugested by Stephen.

I forgot about this. Thanks!

Looks like it does what it's supposed to.

This revision is now accepted and ready to land.Oct 19 2018, 4:50 PM
This revision was automatically updated to reflect the committed changes.