Page MenuHomeFreeBSD

newbus: Ensure that we hold topo lock for calls to devclass_find_internal
AcceptedPublic

Authored by imp on Oct 29 2025, 6:08 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 25, 11:10 PM
Unknown Object (File)
Wed, Nov 12, 8:11 PM
Unknown Object (File)
Mon, Nov 3, 12:08 AM
Unknown Object (File)
Fri, Oct 31, 2:23 PM
Unknown Object (File)
Fri, Oct 31, 2:19 PM
Unknown Object (File)
Fri, Oct 31, 2:18 PM
Unknown Object (File)
Fri, Oct 31, 12:55 AM
Unknown Object (File)
Fri, Oct 31, 12:55 AM
Subscribers

Details

Reviewers
markj
jhb
Summary

devclass_find_internal traverses and modifies the devclass list, so we
need to hold the lock for that operation. Most callers of
devclass_create() won't be holding topo_lock, but arguably should since
that could be a write operation. devclass_find() is called in lots of
places for many different purposes, and it's a 'read only' operation so
the logical case for forcing one hold the topo_lock for the
implementation detail that the devclass is stored in a list is less
clear. Since we never delete a devclass, we don't need a reference
class. And it lokely could be protected by its own lock in the future
(for now it's the topo lock to avoid LoR until that work is done).

Sponsored by: Netflix

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 68239
Build 65122: arc lint + arc unit

Event Timeline

imp requested review of this revision.Oct 29 2025, 6:08 PM

Since devclass_find_internal looks at and modifies the devclass
structures, it needs to hold the topo lock when called.

But presumably the callers of devclass_create() and devclass_find() are doing so as well, since those functions return a devclass pointer.

It looks like devclasses are never freed, so I think it's ok to lock devclass_find() and devclass_create() this way, but other devclass accessors need to be externally locked. For instance, you can't really apply this approach to devclass_get_devices(), since it returns objects that are protected only by the topo lock.

I'd change the message to note that the devclass list is the main thing being synchronized here.

This revision is now accepted and ready to land.Oct 29 2025, 7:35 PM

Since devclass_find_internal looks at and modifies the devclass
structures, it needs to hold the topo lock when called.

But presumably the callers of devclass_create() and devclass_find() are doing so as well, since those functions return a devclass pointer.

There's 6 devclass_create() calls in the tree, mostly in the older recesses of driver support. There's boatloads of devclass_find().

All the devclass_create() (except the couple in the linuxkpi code) are in MOD_LOAD callbacks which John wants me to call unlocked. We'd need another round of topo locking before we could turn it into an assert, which might not be a bad idea... I have no idea how that will affect the linuxkpi code though. I think, but am not completely sure, that we could make it an assert, and since it is a create, logically it makes sense you'd need the lock to do that. So I'll hold off for a bit doing more than this.

devclass_find() is called all over the place. The locking is completely hazardous. There's some abuse of it to find drivers that have attached to other types of devices (but we have no better way to do that).

It looks like devclasses are never freed, so I think it's ok to lock devclass_find() and devclass_create() this way, but other devclass accessors need to be externally locked. For instance, you can't really apply this approach to devclass_get_devices(), since it returns objects that are protected only by the topo lock.

Yes, but that protection is insufficient. You need to ref()/deref() device_t's when you cache them after finding them with devclass_get_devices() or really any other tree walking means. That's dangerous today, but most of the time it doesn't matter since there's some other protocol in place to keep them alive.

I'd change the message to note that the devclass list is the main thing being synchronized here.

I'll update the commit message. That's a good idea. That's the only reason I took the lock in these cases.