Page MenuHomeFreeBSD

snd_hda: Fix a sporadic panic during kldunload
ClosedPublic

Authored by tijl on May 17 2025, 12:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Oct 13, 7:24 AM
Unknown Object (File)
Sun, Oct 5, 4:41 AM
Unknown Object (File)
Mon, Sep 29, 6:25 PM
Unknown Object (File)
Mon, Sep 29, 6:25 PM
Unknown Object (File)
Mon, Sep 29, 6:25 PM
Unknown Object (File)
Mon, Sep 29, 6:24 PM
Unknown Object (File)
Sun, Sep 28, 12:47 PM
Unknown Object (File)
Wed, Sep 24, 4:58 AM
Subscribers

Details

Summary

The interrupt handler releases the device lock in hdaa_stream_intr to
avoid a lock order reversal. This allows child devices to be detached
and destroyed and then the interrupt handler panics.

In the module Makefile sort hdac.c after hdacc.c so hdac_detach is
called first. Let hdac_detach take down the interrupt handler before
detaching child devices.

PR: 286385

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

tijl requested review of this revision.May 17 2025, 12:33 PM

I didn't know about the Makefile thing. Does the sorting really affect the order in which the detach routines will be called?

Yes, when unloading a kernel module the drivers defined with DRIVER_MODULE appear to be deleted in reverse order.

Yes, when unloading a kernel module the drivers defined with DRIVER_MODULE appear to be deleted in reverse order.

Order, though, is really undefined... Drivers have to be prepared for any order, so this patch is, at best, a bandaid that respect. The draining before the detach is good, however.

I haven't tested this for a few days, but I remember that even without the Makefile change, things worked as expected. Is this not the case?

I haven't tested this for a few days, but I remember that even without the Makefile change, things worked as expected. Is this not the case?

There are 2 different panics:

  1. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=286385#c12
  2. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=286385#c26

The first is fixed without the Makefile change. It happens when an interrupt occurs after the child hdacc device has been deleted by the bus_generic_detach call in hdac_detach.

The second happens when an interrupt occurs earlier, during hdaa_pcm_detach. The detach thread holds the channel lock and destroys it in chn_kill while the interrupt thread tries to acquire that lock in chn_intr. I can't reproduce this panic but afaict the Makefile change is needed so hdac_detach is called before hdaa_pcm_detach so the draining happens first.

I haven't tested this for a few days, but I remember that even without the Makefile change, things worked as expected. Is this not the case?

There are 2 different panics:

  1. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=286385#c12
  2. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=286385#c26

The first is fixed without the Makefile change. It happens when an interrupt occurs after the child hdacc device has been deleted by the bus_generic_detach call in hdac_detach.

The second happens when an interrupt occurs earlier, during hdaa_pcm_detach. The detach thread holds the channel lock and destroys it in chn_kill while the interrupt thread tries to acquire that lock in chn_intr. I can't reproduce this panic but afaict the Makefile change is needed so hdac_detach is called before hdaa_pcm_detach so the draining happens first.

For me the panics are fixed with this patch, but I'll wait for @imp and the others reviewers with regards to the Makefile change.

Is there any way to make both detach orders work?

The makefile change is dubious at best. It's a poor substitute for using DRIVER_MODULE_ORDERED which will be deterministic and not depend on details of how the static linker (ld.bfd or ld.lld) work.

The other change also seems a bit dubious. Usually you want to detach child devices first, and child device routines should fully clean themselves up from the parent. This sounds like a bug in the parent driver. When the child driver is detached and tears down its interrupt handler, the parent needs to safely handle that case of an interrupt occurring without an attached child. You can get into this today if you just use devctl to detach the child device and leave the parent device around for example. I haven't looked at how the parent/child interrupt thing is working here, but if the parent bus has a custom bus_setup_intr/bus_teardown_intr that attaches the child's interrupt handler to the interrupt handler for the "real" interrupt from the child, then it sounds like some missing synchronization between the "real" interrupt handler in the parent and the custom bus_teardown_intr method. I think if you fix that bug you won't need the DRIVER_MODULE_ORDERED change either.

@tijl Do you want/can you to take this?

Use DRIVER_MODULE_ORDERED

Let hdac_detach take down the interrupt handler before detaching child devices and order hdac_driver so hdac_detach is called first.

Remove duplicate hdac_if.h from the module Makefile.

A snd_hda driver is split into multiple child devices but there's only one PCI device with one interrupt handler. The child devices are just the way the driver organizes itself internally but they aren't real devices if you ask me. They all share the same lock for example. The bus_*_intr interface isn't used.

LGTM. I tried hot-unloading and I haven't come across any panic so far.

Since the Makefile paragraph in the commit message can be removed now.

This revision is now accepted and ready to land.Aug 31 2025, 12:56 PM
This revision was automatically updated to reflect the committed changes.