Page MenuHomeFreeBSD

bus: Take a step towards providing a proper topology lock
Needs ReviewPublic

Authored by markj on Oct 27 2025, 9:51 PM.
Tags
None
Referenced Files
Unknown Object (File)
Dec 10 2025, 7:46 PM
Unknown Object (File)
Dec 8 2025, 9:48 PM
Unknown Object (File)
Dec 5 2025, 6:52 AM
Unknown Object (File)
Nov 11 2025, 1:10 PM
Unknown Object (File)
Nov 1 2025, 5:17 PM
Unknown Object (File)
Oct 31 2025, 5:34 PM
Unknown Object (File)
Oct 31 2025, 4:38 PM
Unknown Object (File)
Oct 31 2025, 4:37 PM
Subscribers

Details

Reviewers
imp
jhb
andrew
manu
Summary

Giant is implicitly released when the owning thread goes to sleep, which
makes it unsuitable as a general purpose lock. For instance, if two
threads race to call BUS_RESCAN()->pci_reset_method() on the same bus,
and one of them sleeps while invoking device_detach(), double frees
are possible. It's quite ordinary for a DEVICE_DETACH implementation to
sleep and thus release Giant, e.g., by calling device_destroy(), which
can sleep for several reasons.

Use a second sleepable lock to cover this case. Acquire it when
invoking module load handlers or SYSINITs, since many consumers require
that.

With this patch I see a LOR report from witness, between bus_topo_sx and
"USB config SX lock", which occurs during USB bus enumeration, but I
believe this is a false positive. Nested usbd_enum_lock() calls can
occur when enumerating ports on a hub; in each case the USB device is
different, but since all of the config locks belong to a single class,
witness thinks that both lock orders are possible.

This is a WIP and is not intended to be committed as-is; at the very
list, all platforms need to be updated to acquire the lock when invoking
root_bus_configure().

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68166
Build 65049: arc lint + arc unit

Event Timeline

markj requested review of this revision.Oct 27 2025, 9:51 PM
sys/dev/acpica/acpi_thermal.c
975 ↗(On Diff #165169)

why does this need to be locked?

sys/kern/kern_linker.c
233

Is this too high in the stack here and in kern_module? I'd think we'd have the busses lock when changing the topology. I'm surprised there aren't more LORs reported.

Also, why also acquire Giant here. Can we get rid of it? I had a branch a while ago that removed the giant locking here. IIRC I did have to add back a couple of giant locks lower in the stack to cope with some code that needed Giant, but didn't have it.

sys/kern/kern_module.c
99

same question: isn't this too high in the stack?

sys/kern/subr_bus.c
424

I'd remove GIANT_REQUIRED here. Is is that the evnetual goal? I had a branch ages ago that did that. We should strive to *NOT* have giant locked while doing bus things.

438–439

This can be removed. You don't need to lock giant here.
And likely need to cope with the clients of bus_topo_mtx(). It looks like there's only one, the PCIe hotplug lock. That likely needs some attention.

Also, I suspect that the usb topo-like lock can switch to using the global topo lock. hps did it because he didn't want to rewrite newbus to be giant free.

sys/dev/acpica/acpi_thermal.c
975 ↗(On Diff #165169)

I added a bus_topo_assert() in devclass_find_internal(), which should probably be a separate change. But I think a lock is indeed needed for the devclass linked list traversal.

sys/kern/kern_linker.c
233

Some SYSINITs will call into newbus without acquiring the topology lock, I guess because SYSINITs are locked by Giant.

An example: linux_compat_init()->class_register->devclass_create(). I guess code like that should be acquiring the topology lock locally, but that'll take some time.

sys/kern/kern_module.c
99

More or less the same problem as above: module load handlers are invoked with Giant held, which doubles as the topology lock before this patch, so problems with missing locking are hidden.

sys/kern/subr_bus.c
424

That is the eventual goal. My goal in this patch is more modest: I want to fix the existing locking, which is broken e.g., if multiple threads race to call BUS_RESCAN()->pci_rescan_method(). It's broken because Giant is released when going to sleep.

So, adding another lock here fixes the race (because it doesn't have special properties) and it helps expose missing locking elsewhere.

438–439

I think I'm not quite ready to remove Giant from here.

Thanks, I will remove bus_topo_mtx(), since it doesn't really work anymore. I think the PCIe hotplug lock could just be replaced with explicit calls to bus_topo_lock()...?

In D53386#1219249, @imp wrote:

Also, I suspect that the usb topo-like lock can switch to using the global topo lock. hps did it because he didn't want to rewrite newbus to be giant free.

Hmm, ok, I will look into that. There is no global lock in the USB stack AFAICS, so it's hard to know whether it's safe to replace. The USB enumeration lock (enum_sx) is per-device.

  • Drop the assertion in devclass_find_internal().
  • Fix up users of bus_topo_mtx().