Allow run-time delayed attachment of devices
AcceptedPublic
Actions

Authored by nwhitehorn on Apr 30 2018, 12:56 AM.

Details

Reviewers

jhb
jhibbits
bcr
ray

Group Reviewers

manpages

Summary

It fairly frequently happens that some newbus devices have dependencies on others. When the ordering is global, especially when they are neighbors on the same bus level, multipass can set the appropriate attach ordering. However, there are other cases when the dependency is known only at runtime (specified through a device tree, for example) or where intermediate levels of the tree do not have guaranteed order with respect to others in complicated bus topologies.

This patch introduced a new device method (DEVICE_DELAY_ATTACH()) that can be implemented by devices with these kinds of requirements. At each bus pass, the function is called and, if it returns TRUE, the device is left in DS_ALIVE (probed and claimed) but attach is not attempted. In the event of forward progress in reducing the number of pending attachments, this will be iteratively retried.

A modification to the smu(4) driver with a trivial illustration of this mechanism is included. It depends on a GPIO which, on most systems, most of the time, attaches before smu(4) as a result of happenstance in attach ordering. When using FDTs instead of Open Firmware, the node order is inverted for some reason, leading to panics without this patch.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 16726

Event Timeline

nwhitehorn created this revision.Apr 30 2018, 12:56 AM

Herald added subscribers: loos, imp. · View Herald TranscriptApr 30 2018, 12:56 AM

Harbormaster completed remote builds in B16343: Diff 41979.Apr 30 2018, 12:56 AM

imp added inline comments.Apr 30 2018, 2:35 AM

sys/kern/device_if.m
51	int returns FALSE? This is a bad interface. It's supposed to return an ERRNO to detect when it's not implemented. Instead, have it return 0 for no delay or EAGAIN for a delay.
199–203	we usually don't have this doc for each function. I'd omit it.
sys/kern/subr_bus.c
2911	!= 0, per my errno recommendation. EAGAIN today, but ENXIO might also be an option in the future for a 'never mind' situation. Also, without a softc, how's this code supposed to know what's going on? Softc created in probe is freed after the probe.
3990	where is attachments_pending initialized?
3993	Recommend either going to && or using { } around the body here even though it's not strictly required.
3998	where to do we try this again?

Simplify logic dealing with pending attachments.

Harbormaster completed remote builds in B16358: Diff 42006.Apr 30 2018, 3:34 PM

nwhitehorn added inline comments.Apr 30 2018, 3:35 PM

sys/kern/device_if.m
51	The ideal interface would be to trigger this when DEVICE_ATTACH() returns EAGAIN. However, device_attach() does enough setup that that is non-trivial and I don't see a way around some function like this. In general, living in this world, I don't care about the interface. For this suggestion in particular, since DEVICE_ATTACH() should keep all the error detection/reporting logic, I'm a bit hesitant to give this shim function more options than a boolean and do not see the advantage of 0/EAGAIN vs. TRUE/FALSE (the updated patch changes it to a bool). Is there one?
199–203	Every other function in this file has it. It seemed weird, but I wanted to match existing style.
sys/kern/subr_bus.c
2911	Why wouldn't you want "never mind" to be returned by device_attach()? All this function is meant to do is to gate "you can proceed now to success or failure and everything is out of limbo". The function is meant to detect global conditions (have all the partner devices I need been attached, in particular), so there should not be any softc yet, just ivars. The softc (and devclass, etc.) does exist already -- this is called in exactly the same context as DEVICE_ATTACH().

(For future reference, a diff with context is nicer.)

(BTW, while reviewing this I found the device_register() "gem" which is a bit of a hack that snuck in with iflib and should be replaced with an iflib_if.m and a dedicated IFLIB_REGISTER rather than looking at magic numbers in the first 4 bytes of a pointer, etc. That's just ridiculous.)

I assume you're waiting on docs until the design is ok'd (which is sensible), but once that is settled DEVICE_ATTACH.9 probably needs extending to document this.

sys/kern/device_if.m
51	I'm a bit torn on this one. The kobj interface assumes that kobj methods always return an int (errno) value, and the default global fallback if a method isn't found is to fail with ENXIO. (kobj_error_method) For that reason, I usually prefer to always return error values from kobj methods in such a way that something kind of sane happens if we ever fallback to kobj_error_method due to programmer error, etc. (The fact that bus_alloc_resource() returns a pointer instead of an error I consider to be a bug btw, but one we can't easily fix at this point.) That said, that would basically require treating ENXIO as 'no method installed so go ahead and call DEVICE_ATTACH' which is somewhat odd. Also, a default implementation really should ensure that kobj_error_method is never called (you have to really try hard to have a kernel that would have the 'cookie' symbol used to identify the method but not have the default method since both things live in the same auto-generated device_if.c). I think the fact that in this case any real error handling can (and should) be done in DEVICE_ATTACH probably does make it ok to just do a bool (bool uses 'true/false' and not 'TRUE/FALSE' btw).
199–203	It is also wrong, but it does seem consistently wrong in the file. A separate commit to fix these to use DEVMETHOD instead would be good.

In D15229#321549, @jhb wrote:

(For future reference, a diff with context is nicer.)

(BTW, while reviewing this I found the device_register() "gem" which is a bit of a hack that snuck in with iflib and should be replaced with an iflib_if.m and a dedicated IFLIB_REGISTER rather than looking at magic numbers in the first 4 bytes of a pointer, etc. That's just ridiculous.)

I assume you're waiting on docs until the design is ok'd (which is sensible), but once that is settled DEVICE_ATTACH.9 probably needs extending to document this.

Yes, I was. If you think it is in that situation now, I'm happy to write the docs and refresh the patch. I'm torn on the return value for DEVICE_DELAY_ATTACH() as well, and that's the part I'd really like to nail down since it is the entirety of the API defined by this patch.

sys/kern/device_if.m
51	I don't have any strong opinions here so long as any actual error reporting is deferred to DEVICE_ATTACH().
199–203	OK, I'd like to do this after this commit though.
sys/kern/subr_bus.c
3998	The idea is to loop if you resolve the dependency[ies] of at least one device, it is worth iterating through the others again in this pass -- one of the others might have depended on a just-attached device.

Any further comments? I'd really like to get this, or something like it, in.

I think I'm ok with the interface. The question is more of an implementation question.

sys/kern/subr_bus.c
3998	So I think one question is what happens if you kldload a driver and it always returns true for DELAY_ATTACH because it's dependency isn't present. Will it eventually retry when you later kldload the driver for the dependency? It's not clear (a bit harder to tell where this bit of code lives without a context diff). I think it won't which is what Warner is asking (I think). You might have to keep some kind of global list of pending children (so device_attach() would place a device on this list when DELAY_ATTACH returns true) and you'd have to walk that list with a loop like this retrying attaching any devices in the list as long as you are making progress, but do that 1) at the end of the boot-time probe, and 2) in the driver module handler that driver kld's use. You'd also have to be sure to take into account pending devices in device_delete() by removing the device_t from the pending list if necessary.

Thanks -- I'll tweak the implementation to handle KLDs properly and write a man page, then update this diff.

sys/kern/subr_bus.c
3998	At present, delayed attachment is handled at the tail end of each bus pass, which means the tree can only fully converge at the following pass. This does not work too well with klds. We could, instead, have a global function (bus_process_pending_attachments() or the like) that is called after each pass and after each kld is loaded that either iterates through a global list of pending attachments or walks the tree and tries to attach things. I kind of like the tree-walk approach since it removes some global state accounting and the performance impact is minimal.

Probe for pending devices at the end of each pass and after modules loaded. Add documentation.

Herald added a reviewer: manpages. · View Herald TranscriptMay 20 2018, 7:24 PM

Harbormaster completed remote builds in B16723: Diff 42773.May 20 2018, 7:24 PM

I suggest running textproc/igor on the man pages. It will give you a few tips on how to correct some errors.
Thanks for writing the man page.

share/man/man9/DEVICE_ATTACH.9
67	You need to start a new sentence on a new line after the sentence stop. Run textproc/igor and it will tell you those.
share/man/man9/DEVICE_DELAY_ATTACH.9
56	Same here for the line break after the sentence stop.

Fix style and spelling issues in man pages.

Harbormaster completed remote builds in B16726: Diff 42777.May 20 2018, 9:05 PM

OK from manpages.

This revision is now accepted and ready to land.May 21 2018, 10:38 AM

Any other thoughts? I'd like to get this in soon since it is blocking other things.

It's been another couple weeks. Any objections or approvals?

An alternative in this case would be https://reviews.freebsd.org/D15759 which defers establishing the interrupt until both devices are present, regardless of the order.

Another alternative would be to attach the devices, but defer attaching the children and/or configuring the interrupts until interrupts are enabled. This is another way to ensure doorbell is present and shown in https://reviews.freebsd.org/D15760

The final option would be to do a variation on the prior option, but institutionalize it in DEVICE_ATTACH_LATE. It would be called just after interrupts were enabled for deferred work like this. It would cover the vast majority of driver independencies by giving a 'last' point. At that point, you return an error if you still weren't ready, and we'd loop while the number of devices returning an error was declining. It could largely replace config_intrhook.

In D15229#333038, @imp wrote:

The final option would be to do a variation on the prior option, but institutionalize it in DEVICE_ATTACH_LATE. It would be called just after interrupts were enabled for deferred work like this. It would cover the vast majority of driver independencies by giving a 'last' point. At that point, you return an error if you still weren't ready, and we'd loop while the number of devices returning an error was declining. It could largely replace config_intrhook.

I think this would work reasonably well for the general case here; the other suggestions would solve the case of smu(4), but there are other, simpler, ways to do that.

That said, I'm not sure what the advantage of a DEVICE_ATTACH_LATE() is over the mechanism originally proposed here or something like it except that it, as a side effect, delays until after interrupts -- which might not even be wanted, for example if the device in question is an interrupt controller.

In D15229#333057, @nwhitehorn wrote:

In D15229#333038, @imp wrote:

The final option would be to do a variation on the prior option, but institutionalize it in DEVICE_ATTACH_LATE. It would be called just after interrupts were enabled for deferred work like this. It would cover the vast majority of driver independencies by giving a 'last' point. At that point, you return an error if you still weren't ready, and we'd loop while the number of devices returning an error was declining. It could largely replace config_intrhook.

I think this would work reasonably well for the general case here; the other suggestions would solve the case of smu(4), but there are other, simpler, ways to do that.

That said, I'm not sure what the advantage of a DEVICE_ATTACH_LATE() is over the mechanism originally proposed here or something like it except that it, as a side effect, delays until after interrupts -- which might not even be wanted, for example if the device in question is an interrupt controller.

If the device is an interrupt controller, you need to put it on a different pass. There's no way around that. But this isn't an interrupt controller, and interrupt controllers are special beasts that already have special handling in many places. Having a delayed attach for them is likely completely unacceptable so none of the proposals is appropriate for them.

DEVICE_ATTACH_LATE would have an advantage over config_intrhook in that it could walk the tree multiple times. DEVICE_ATTACH_LATE would return 0 or an error. It would walk the tree as long as the number of errors is decreasing each pass. this would allow for an arbitrary level of nesting for more 'ordinary' situations where one can't do all the work one would like because the resources to do that work aren't yet available.

In D15229#333064, @imp wrote:

In D15229#333057, @nwhitehorn wrote:

In D15229#333038, @imp wrote:

The final option would be to do a variation on the prior option, but institutionalize it in DEVICE_ATTACH_LATE. It would be called just after interrupts were enabled for deferred work like this. It would cover the vast majority of driver independencies by giving a 'last' point. At that point, you return an error if you still weren't ready, and we'd loop while the number of devices returning an error was declining. It could largely replace config_intrhook.

I think this would work reasonably well for the general case here; the other suggestions would solve the case of smu(4), but there are other, simpler, ways to do that.

That said, I'm not sure what the advantage of a DEVICE_ATTACH_LATE() is over the mechanism originally proposed here or something like it except that it, as a side effect, delays until after interrupts -- which might not even be wanted, for example if the device in question is an interrupt controller.

If the device is an interrupt controller, you need to put it on a different pass. There's no way around that. But this isn't an interrupt controller, and interrupt controllers are special beasts that already have special handling in many places. Having a delayed attach for them is likely completely unacceptable so none of the proposals is appropriate for them.

Delayed attach of interrupt controllers is a real case that we have to deal with on PPC, but you are right that it probably needs special handling (which is already there).

DEVICE_ATTACH_LATE would have an advantage over config_intrhook in that it could walk the tree multiple times. DEVICE_ATTACH_LATE would return 0 or an error. It would walk the tree as long as the number of errors is decreasing each pass. this would allow for an arbitrary level of nesting for more 'ordinary' situations where one can't do all the work one would like because the resources to do that work aren't yet available.

Yes, this seems fine to me. I'm not quite sure what the advantage is relative to DEVICE_DELAY_ATTACH() -- the two seem pretty interchangeable -- but I would be fine with either.

Nice to have feature, to allow to wait for required drivers to attach.

Herald added a subscriber: bdragon. · View Herald TranscriptAug 21 2019, 10:21 PM

If we wanted to avoid the race between device_delayed_attach() and device_attach, we could have device_attach return EAGAIN, per the changes I suggested.

sys/kern/subr_bus.c
2994–2999	delete these
3006	Insert if (error != EAGAIN)) { here
3014	and a } else if (errno == EAGAIN) { device_sysctl_fini(dev); dev->state = DS_ALIVE; return EAGAIN; }
sys/powerpc/powermac/smu.c
297	This would transition into if (smu_doorbell == NULL) return EAGAIN;

Some functionality like this is needed, since we need to have some way to try again later. It isn't like the interrupt delay we need for things like disk drives because there's no signal to try again like there is when we turn on the interrupts...

imp mentioned this in D21465: Create new wrapper function: bus_delayed_attach_children().Aug 29 2019, 3:46 PM