Paths

Table of Contentst

ofwbus: remove handling of resources from ofwbus
ClosedPublic
Actions

Authored by ehem_freebsd_m5p.com on May 30 2021, 5:00 PM.

Details

Reviewers

manu
jhibbits
andrew

Group Reviewers

PowerPC

Commits

rG9cb88cc937ec: ofwbus: remove handling of resources from ofwbus
rGf9bdaab95ec4: ofwbus: remove handling of resources from ofwbus

Summary

The architecture nexus should handle allocation and release of memory and
interrupts. These need to be visible to the rest of the kernel and not
hidden in the OpenFirmware code.

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 39573
Build 36462: arc lint + arc unit

Event Timeline

ehem_freebsd_m5p.com created this revision.May 30 2021, 5:00 PM

Herald added a subscriber: imp. · View Herald TranscriptMay 30 2021, 5:01 PM

ehem_freebsd_m5p.com requested review of this revision.May 30 2021, 5:01 PM

ehem_freebsd_m5p.com edited subscribers, added: mhorne; removed: manu.May 30 2021, 5:01 PM

Harbormaster completed remote builds in B39573: Diff 90132.May 30 2021, 5:01 PM

This is a very early tentative proposal. The issue is without this resource allocations which get to the nexus may be given ranges overlapping with what the OpenFirmware code has. Two drivers trying to use the same physical address range will be Bad(tm).

Looks like every variant of nexus.c except for PPC is already prepared to handle memory/interrupt allocations and therefore the correct solution is to simply forward all allocations to the nexus. There is likely a need to introduce similar code in sys/powerpc/powerpc/nexus.c.
Trick would then be every variant of sys/*/*/nexus.c would have mostly identical code for handling these allocations. Time to introduce sys/dev/bus/root.c? Difficulty here is there is no architecture-independent function for retrieving the highest physical address (for mem_rman.rm_end).

This is needed for the attempt at getting Xen operational on ARM. When running on Xen there is a need to allocate physical addresses and tell the hypervisor to map virtual devices there. If both nexus.c and ofwbus.c only have part of the story of which physical addresses are used, trouble will ensue.

To state the problem as I understand it: ofwbus fails to reserve or report its resource usage back to nexus, violating an assumption of newbus. This is of no consequence when all devices allocate resources through ofwbus, but becomes an issue when a child of nexus needs to allocate an unused resource region (this will be the case for Xen's PV bus on arm64).

It looks like the current behaviour comes from R10:a8126ae500bf and R10:65d08437ef51, where portions of powerpc's nexus were generalized into ofw_nexus.c (then renamed to ofwbus.c).

As you point out, simply removing these routines is an insufficient solution on powerpc; instead, they should probably be moved back into powerpc's nexus.c. Other architectures are set up to handle allocations from nexus already, but we would need verify that things still behave properly on each platform if this change is applied.

Tagging @nwhitehorn, who might have input on this.

I *think* this will be fine if the equivalent code moves to PPC nexus (where it came from in the first place long ago).

@mhorne, you got it exactly right. Issue is ofwbus is keeping everything to itself, which is fine if *everything* goes through ofwbus, but a major problem for anything independent of ofwbus (such as Xen PV).

I hadn't tracked down when it happened, so that is interesting to see it originated on PPC. Thing is what was being done was almost right. Quite a bit of sys/*/*/nexus.c is common code which should be merged together, but it needs to move to a nexus_common.c or root.c which is $somewhere (sys/dev? sys/kern?).

Updating with a potential move to sys/powerpc/powerpc/nexus.c. I've tried to move this a bit closer to the other nexuses. Most others use static variables in the file, rather than the softc. I've also opted to limit the regions for the resource managers since the resource manager code actually breaks when you use the 0 to ~0 range.

Harbormaster completed remote builds in B39943: Diff 90989.Jun 17 2021, 12:54 AM

Problem with D30554 is I'm only able to test the removal. I have no PPC machine(s) to test, so I cannot say anything besides it compiles. I'm also wondering whether I should actually be removing even more from sys/dev/ofw/ofwbus.c.

ehem_freebsd_m5p.com added a child revision: D30950: xen/arm64: enable arm64 Xen configuration.Jun 30 2021, 1:28 AM

mhorne mentioned this in D32357: arm: add an interrupt rman to nexus.Oct 7 2021, 5:58 PM

Okay, I was able to boot this on a few devices for arm, arm64, and riscv.

For arm, I needed to add an intr_rman and its handling to its nexus bus (see D32357).

Otherwise, devices attached fine and I was able to see that the expected changes in the output of devinfo -u; that is, the same resources allocated from ofwbus are now allocated from nexus. Check out P517 and P518.

I was unable to test mips, mainly since the only platform I have access to (MALTA64 in QEMU) does not use FDT. Given the state of the port and its planned removal from CURRENT, this shouldn't be a blocker.

Someone with PowerPC hardware will need to do the same testing, ideally on both 32 and 64-bit.

sys/powerpc/powerpc/nexus.c
149–150 ↗	(On Diff #90989)	Please do not change the region boundaries, we want to maintain the same behaviour before and after moving the code around.
360 ↗	(On Diff #90989)	Moving this out of ofwbus means that arm, arm64, mips, and riscv are now without a `bus_adjust_resource` method. I think this is okay, it is not used very widely and arm64 with ACPI has always been without it. Still, it will be worth noting in the final commit message, as this could be a regression.

I've run into some an additional oddity with the Open Firmware core. Only allocated resources are visible in devinfo.

The device-tree provided by Xen (see: D29875) provides an interrupt and a memory range. The memory range is completely empty, but is provided as a suggested range for mapping the Xen grant table. This range is completely invisible unless Xen's device-tree code allocates it.

sys/powerpc/powerpc/nexus.c
149–150 ↗	(On Diff #90989)	Each time I run across the default range I hear klaxons screaming "uninitialized! uninitialized! uninitialized!" (new form of Dalek?). I really dislike leaving this alone, even if that ends up being a separate diff.
360 ↗	(On Diff #90989)	Huh. I hadn't noticed any issues, but that is certainly worthy of mention.

mhorne added inline comments.Oct 8 2021, 2:46 PM

sys/powerpc/powerpc/nexus.c
149–150 ↗	(On Diff #90989)	Well, they are not uninitialized, but initialized to the default/maximum values. Now that the overflow in `rman_reserve_resource_bound()` has been fixed we shouldn't fear using this range, it is meant to be supported.

I've tested these changes on a PPC64 VM. With them, the virtual SCSI device fails to attach, making it impossible to mount root.
Below are the relevant dmesg messages:

ofwbus0: <Open Firmware Device Tree> on nexus0                                 
xicp0: <External Interrupt Presentation Controller> on ofwbus0
xicp0: Handling CPUs 0-31                                                      
vdevice0: <POWER Hypervisor Virtual Device Root> on ofwbus0                    
vscsi0: <POWER Hypervisor Virtual SCSI Bus> irq 16781571 on vdevice0
vscsi0: Could not allocate IRQ         
device_attach: vscsi0 attach returned 6

The errors are similar on real hardware, where it fails to attach aacraid and xhci:

aacraid0: <Adaptec RAID Controller> mem 0x80000000-0x800fffff,0x80180000-0x801803ff irq 1038328 at device 0.0 numa-domain 0 on pci5
aacraid0: Async. mode not supported by current driver, sync. mode enforced.
Please update driver to get full performance.
aacraid0: Enable Raw I/O
aacraid0: Enable 64-bit array
aacraid0: using line interrupts
aacraid0: can't allocate interrupt
device_attach: aacraid0 attach returned 22
xhci0: <XHCI (generic) USB 3.0 controller> mem 0x80000000-0x8000ffff,0x80010000-0x80011fff irq 1036280 at device 0.0 numa-domain 0 on pci7
xhci0: 64 bytes context size, 64-bit DMA
xhci0: Could not allocate IRQ
device_attach: xhci0 attach returned 6

The issue can be reproduced with a non-accelerated VM too, on non-powerpc hardware, with a command like:

qemu-system-ppc64 -cpu power8 -machine pseries,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken -m 1G -drive file=disk.qcow2,media=disk \
-netdev user,id=lan0,ipv6=off -device e1000,netdev=lan0,mac=52:54:98:13:64:01 -nographic -vga none -serial mon:stdio -nodefaults

The disk image can be obtained from FreeBSD CI: https://artifact.ci.freebsd.org/snapshot/14.0-CURRENT/latest/powerpc/powerpc64/disk.qcow2.zst.
Then only the kernel would have to be replaced, by the one to be tested.

Just note that CI image expects virtio disks instead of vscsi ones, so, even the default image stops at mountroot, but in this case the disks are detected and root can be found at ufs:da0s3.
BTW, virtio driver works fine even with this patch (it can be tested by adding ",if=virtio" to QEMU's -drive argument).

In D30554#734629, @luporl wrote:
I've tested these changes on a PPC64 VM. With them, the virtual SCSI device fails to attach, making it impossible to mount root.
Below are the relevant dmesg messages:
ofwbus0: <Open Firmware Device Tree> on nexus0                                 
xicp0: <External Interrupt Presentation Controller> on ofwbus0
xicp0: Handling CPUs 0-31                                                      
vdevice0: <POWER Hypervisor Virtual Device Root> on ofwbus0                    
vscsi0: <POWER Hypervisor Virtual SCSI Bus> irq 16781571 on vdevice0
vscsi0: Could not allocate IRQ         
device_attach: vscsi0 attach returned 6

Looks like it fails to attach since 16781571 > (1 << 16). @ehem_freebsd_m5p.com please relax the rman boundary constraints.

Returning things to almost how they were before.

Harbormaster completed remote builds in B42244: Diff 97087.Oct 19 2021, 12:21 AM

In D30554#734639, @mhorne wrote:

Looks like it fails to attach since 16781571 > (1 << 16). @ehem_freebsd_m5p.com please relax the rman boundary constraints.

That I cannot dispute, so seems I have to let things be exactly the way they were before despite really not liking that.

Sparse interrupt numbering? That certainly explains why the PowerPC interrupt code is written the way it was.

Thanks for the test report @luporl, such was badly needed.

ehem_freebsd_m5p.com added a parent revision: D32357: arm: add an interrupt rman to nexus.Oct 19 2021, 1:03 AM

ehem_freebsd_m5p.com added a child revision: D31956: RELNOTES: add entry for Xen/ARM64.Oct 19 2021, 1:07 AM

In D30554#734742, @ehem_freebsd_m5p.com wrote:

In D30554#734639, @mhorne wrote:

Looks like it fails to attach since 16781571 > (1 << 16). @ehem_freebsd_m5p.com please relax the rman boundary constraints.

That I cannot dispute, so seems I have to let things be exactly the way they were before despite really not liking that.

Sparse interrupt numbering? That certainly explains why the PowerPC interrupt code is written the way it was.

Yes. The IBM series of hardware (pseries, powernv) use a 24-bit integer for the IRQ, which is a special routing encoding for the interrupt controller, which encodes priority, source, and other information. Not all powerpc hardware is this way, but all the newer (power8 and later? Maybe power7?) IBM hardware is.

Before these changes, ofw_bus was a simplebus that also did a bit of resource management. With the rman stuff gone, is there any difference at all between ofw_bus and simplebus?

Don't take this as me requiring that ofw_bus be removed as part of making these changes. But if the answer is that there's no difference anymore, we should think about consolidating and cleaning things up as a followup change.

In D30554#734857, @jhibbits wrote:

In D30554#734742, @ehem_freebsd_m5p.com wrote:

Sparse interrupt numbering? That certainly explains why the PowerPC interrupt code is written the way it was.

Yes. The IBM series of hardware (pseries, powernv) use a 24-bit integer for the IRQ, which is a special routing encoding for the interrupt controller, which encodes priority, source, and other information. Not all powerpc hardware is this way, but all the newer (power8 and later? Maybe power7?) IBM hardware is.

Interesting approach.

I think there are 3 features INTRNG would need for an attempt at taking over as the FreeBSD interrupt framework:

Support for interrupt numbers being assigned outside of INTRNG (x86)
Support for allocation of interrupt ranges (x86, used by Xen code, used for MSI interrupts)
Support for sparse interrupt numbering (PowerPC)

I suspect multiple pieces of x86 code assume interrupt numbers can be mapped to an interrupt structure quickly, thus #3 would likely mean hash table or tree. Simply pondering what would be needed for INTRNG to take over...

This revision was not accepted when it landed; it landed in state Needs Review.Feb 8 2023, 8:52 PM

Closed by commit rGf9bdaab95ec4: ofwbus: remove handling of resources from ofwbus (authored by ehem_freebsd_m5p.com, committed by mhorne). · Explain Why

This revision was automatically updated to reflect the committed changes.

mhorne added a commit: rGf9bdaab95ec4: ofwbus: remove handling of resources from ofwbus.

Herald added a subscriber: emaste. · View Herald TranscriptFeb 8 2023, 8:52 PM

I went ahead and pushed this through, after testing on each of the affected platforms, and making some required changes (the resource list handling needs to stay in ofwbus). If there are problems, they should show up somewhat obviously, in the form of devices failing to attach/acquire resources.

mhorne added a commit: rG9cb88cc937ec: ofwbus: remove handling of resources from ofwbus.Apr 17 2023, 5:04 PM