Page MenuHomeFreeBSD

kern/rman: warn/KASSERT() on garbled reservations
ClosedPublic

Authored by emaste on Jul 26 2021, 8:30 PM.
Tags
None
Referenced Files
F81635864: D31312.id92979.diff
Fri, Apr 19, 7:25 AM
Unknown Object (File)
Mar 6 2024, 11:01 AM
Unknown Object (File)
Feb 21 2024, 8:44 AM
Unknown Object (File)
Jan 27 2024, 3:08 PM
Unknown Object (File)
Jan 27 2024, 5:42 AM
Unknown Object (File)
Jan 16 2024, 12:33 AM
Unknown Object (File)
Jan 14 2024, 1:47 PM
Unknown Object (File)
Jan 13 2024, 4:43 AM

Details

Summary

A count of 0 or managing the default range suggests incorrect use of the
rman code. Triggering these likely points to a bus being lazy. Hint
to several buses to fix their use of the resource manager code. Various
nexus implementations also need fixing.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 40749
Build 37638: arc lint + arc unit

Event Timeline

This is the other side of what got brought up in D30280. Assert on zero count, warn about trying to manage the default range. Passing a count of zero is suspicious, so an assert seems appropriate. Unfortunately the default range is in common use in many places, so merely add a warning for that.

Notably the ARM*, MIPS, PowerPC (OFW) and RISC-V nexuses all initialize a mem_rman with the range [0, ~0]. This is definitely wrong since none of the architectures with 64 address bits can actually physically address 2^64 bytes. As a result their nexuses will return success while providing resources with invalid addresses (D30297 fixes this for ARM64, other architectures need similar fixes).

I'm unsure what severity of warning should be generated for that range, I also wonder whether this should be limited to debug kernels. This is a widespread issue, but may not trigger breakage that often, since few devices need to allocate unused physical addresses (Xen needs this, other hypervisors may desire similar functionality).

sys/kern/subr_rman.c
163

I don't know if we are ready for this yet. Keep in mind this is PA's, not VA's. x86's nexus queries the max PA supported for its mem_rman which is probably what other platforms should do. IRQ rmans though can effectively be unbounded as they are just cookie values and not associated with a hardware resource.

455

This is good.

Adjusting error message, include rm->rm_descr since that is likely to be helpful.

ehem_freebsd_m5p.com added inline comments.
sys/kern/subr_rman.c
163

I disagree with not being ready for this. This is merely a warning (even though it says "ERROR") in debug builds. The issue is right now the x86 nexus is the only one which actually bothers to loads its rman_mem with the actual range of valid physical addresses.

My goal is to generate a scary error message to hopefully push the users of the resource manager code to fix their uses of resource managers.

I've got mixed feelings on the interrupt resource managers. True, they're pretty much cookie values, but leaving them unbounded feels like missing a chance for an early warning of something going wrong.

One other note here. I'm wondering about the conditional above. Does it really make sense to return EINVAL if start or end exceed the rm_start .. rm_end range? Might it make more sense to clamp start and end to the rm_start .. rm_end range?

sys/kern/subr_rman.c
163

Some platforms don't have a standardized way to query the maximum physical memory address (RISC-V is an example of this), and thus no path forward for "fixing" their use of nexus rmans. Without one, this printf can only create unactionable noise on these platforms, not the useful warning you are hoping for.

455–456

I would suggest this be squashed into D30280 when it is committed.

sys/kern/subr_rman.c
163

That must be rather interesting to work with. Sounds like that could end up a major issue for the RISC-V port of Xen. Guess that suggests this may need to be marked abandoned if action is going to remain impossible for the time being.

455–456

Fine by me.

Can't actually mark this as closed, can only mark as abandoned. I suspect a commit message can close multiple reviews by listing multiple differentials.

emaste reclaimed this revision.
emaste accepted this revision.
emaste added a reviewer: ehem_freebsd_m5p.com.
This revision is now accepted and ready to land.Jun 29 2022, 1:58 AM

Looks like it can only be closed if it is in "accepted" state.