On some architectures, u_long isn't large enough for resource definitions. Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but type `long' is only 32-bit. This extends rman's resources to uintmax_t.
Details
Tested on a PowerPC p5020-based board, which places all resources in 0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 2764 Build 2785: arc lint + arc unit
Event Timeline
Could you, please, provide some reasoning behind the way the change is done. As I understand, the intent is to support full range of the resources values for PAE-like configuration, where native platform long type range does not cover e.g. the bus addresses. But, is it intended to provide 64bit resource ranges for e.g. arm or non-PAE i386 ? At least, I did not found any handling of possible mis-allocation of too-high resources. Or, is it not important ?
sys/sys/types.h | ||
---|---|---|
253 ↗ | (On Diff #11210) | Do you see a need for the type outside the kernel ? |
That may have been an oversight. It's not intended to provide resources which aren't accessible by the architecture. armv7 (more recent cortex's, I think?), i386-PAE, and powerpc (some, specifically mpc7450, and e500v2/e500mc, perhaps others) all support 36-bit addresses. But where it isn't supported, bounds checking should probably be performed (it may already be the case, only later at access time, rather than at allocation time).
Reviewing sys/x86/x86/nexus.c, for example, mem_rman.rm_end could be set to BUS_SPACE_MAXADDR instead of ~0, and that should be sufficient, and a similar change would be possible in the other archs. Thoughts?
sys/sys/types.h | ||
---|---|---|
253 ↗ | (On Diff #11210) | I don't think so. Are resources or resource ranges ever exported to user space? |
There is also a case where we need to handle larger intermediate values that will be resolved to a 32-bit value in a parent handler. This has been shown recently with the Armada 38x support, they use a 64-bit value as a key for the parent ranges. The parent then uses this to find the real physical address (or another key for this devices parent). When we get to nexus this should be a 32-bit value, however it may pay to check this.
I think that this is enough, thanks for the analysis.
sys/sys/types.h | ||
---|---|---|
253 ↗ | (On Diff #11210) | May be by devinfo(3) ? At least man page and devinfo.h use unsigned long for start/size/end. In the comment above, I mean, shouldn't the rman_res_t braced with #ifdef _KERNEL ? |
Add range checking in nexus. It's assumed anything underneath the nexus does range checking within its own bounds.
I still have to fix the man pages for rman(9), and bus_*_resource(9), coming soon. Looking for additional comments on the code. I'm also building tinderbox with a s/__rman_res_t/rman_res_t/, to make it look nicer.
sys/sys/types.h | ||
---|---|---|
253 ↗ | (On Diff #11210) | Hmm, you're right. I hadn't even looked at userspace usage. Right now, with the patch, devinfo would be broken, to a degree. Since I don't know the prevalence of this, I don't know how broken it actually is (how many devinfo consumers are there, and right now who beyond powerpc would be broken by this?) Would it be acceptable to tackle that in a separate commit, or should it come as a package deal? |
POSIX says that everything in the namespace which ends in '_t' suffix is reserved, so I think it is even acceptable to provide rman_res_t for the userspace. Esp. since there are some userspace consumers.
We have at least devinfo(8) utility, which is rarely used, but is absolutely required when it is needed. Its interface definitely falls into the 'kernel/management' category, so the change due to rman_res_t is acceptable on HEAD and for next stable.
Other than that, I do not see much reasons to sit on the patch any more.
Update to include man pages and expose rman_res_t to user space.
libdevinfo hasn't been updated to expose the rman_res_t to its clients, internally it still converts the rman_res_t values to unsigned long. This will be fixed in a later diff.
sys/dev/ntb/ntb_hw/ntb_hw.c | ||
---|---|---|
750–752 ↗ | (On Diff #11860) | This change seems gratuitous (and wrong). Also, this diff doesn't appear to have any context? Next time use 'arc diff' or 'diff -U99999' please. |
~0 is 0xffffffff on AMD64, not 0xffffffffffffffff. Every replacement of ~0ul by ~0 seems wrong.
On PowerPC, ~0 cast to uintmax_t resolves to 0xffffffffffffffff. I just tested on freefall, too, the following program:
#include <stdio.h>
#include <stdint.h>
void
do_print_um(uintmax_t m)
{
printf("%jx\n", m);
}
int main(int argc, const char **argv)
{
do_print_um(~0);
}
And it printed: ffffffffffffffff
So your assertion is incorrect.
sys/dev/ntb/ntb_hw/ntb_hw.c | ||
---|---|---|
750–752 ↗ | (On Diff #11860) | Regarding the diff: I'll remember to use 'diff -U99999' in the future. I created this diff on a machine in the cluster that doesn't have arc installed, and didn't regenerate it locally before posting. Regarding the change: It should be perfectly valid. It may be a little gratuitous, given that ntb is amd64 only (though, i386 should run on hardware with it), but it's consistent with all other changes. |
sys/dev/ntb/ntb_hw/ntb_hw.c | ||
---|---|---|
750–752 ↗ | (On Diff #11860) | It's gratuitous because it's a comment :). |
sys/dev/ntb/ntb_hw/ntb_hw.c | ||
---|---|---|
750–752 ↗ | (On Diff #11860) | D'oh, yes. I can remove it if you'd like. |
~0 cast to uintmax_t and ~0 are different values. ~0 has type int.
Values of type int may be implicitly converted to values of type uintmax_t such as by passing in a function argument. C99 (n1256.pdf) 6.3.1.3 says:
"""Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type."""
~0 (as type int) is -1, in our twos-complement implementation. -1 is not representable in the unsigned type, so adding 2**64 to get 0xffffffffffffffff is defined behavior, but relies on some somewhat obscure language in the language specification. It seems like much better style (to me) to directly write the intended value, as Conrad indicated.
I think that the ~0 is correct and acceptable. You do need to know C rules about promotion to understand it, but I do not see why should we code to satisfy somebody who did not learned.
Anyway, we already use the trick, look, for instance, at the following definitions in sys/x86/include/bus.h:
#define BUS_SPACE_INVALID_DATA (~0) #define BUS_SPACE_UNRESTRICTED (~0)
The file is bi-arch (i386 and amd64) and the definitions utilize the same trick.There are much more, mostly in the related busdma tag specs, grep for it.
I strongly prefer the ~0 over the useless pass over the source code.
UPDATING | ||
---|---|---|
64 ↗ | (On Diff #11860) | This is a merge-o. |
sys/arm/arm/nexus.c | ||
348 ↗ | (On Diff #11860) | This would seem to be better handled by a proper rm_end in the rman instead when it is created (similar to what kib@ noted for x86). |
sys/dev/arcmsr/arcmsr.c | ||
4326 ↗ | (On Diff #11860) | This should just be using bus_alloc_resource_any() (as the rest of this driver probably should be), but that is a separate matter. |
sys/dev/ata/ata-isa.c | ||
85 ↗ | (On Diff #11860) | Why RM_MAX_END here instead of ~0? |
sys/dev/cardbus/cardbus_cis.c | ||
62 | This isn't a rman_res_t value here, so this should probably be in a separate change. Other "magic" pointers use -1 instead (e.g. MAP_FAILED for a failed mmap()). | |
sys/dev/fdt/simplebus.c | ||
338 | Use ~0 here? (You have mostly used ~0 rather than RM_MAX_END in the rest of the code.) | |
sys/dev/mii/mii.c | ||
404 ↗ | (On Diff #11860) | Agreed, both changes in this file are unrelated. |
sys/dev/ntb/ntb_hw/ntb_hw.c | ||
750–752 ↗ | (On Diff #11860) | OTOH, if "0, ~0" is now the "correct" way to call bus_alloc_resource(), I wouldn't want incorrect examples in comments to throw off future readers. |
sys/dev/ofw/ofwbus.c | ||
159 | Given the other changes, not sure why this is using RM_MAX_END instead of ~0? (Also lower down in this same file.) | |
sys/dev/pccard/pccard.c | ||
535 | Since you've changed the 'cardaddr', 'length', and 'hostaddr' fields in these structures (pccard_ce_iospace and pccard_ce_memspace) to rman_res_t, these casts seem redundant? Perhaps these were from an earlier version before the structures were changed? | |
sys/dev/pci/pci.c | ||
3009 | I thought we had some sort of code somewhere to special case firmware-assigned BAR values that were "out of range", but I can't find it now. Ah, it is this code in pci_add_map(): if ((u_long)base != base) { device_printf(bus, "pci%d:%d:%d:%d bar %#x too many address bits", pci_get_domain(dev), pci_get_bus(dev), pci_get_slot(dev), pci_get_function(dev), reg); return (barlen); } This check can be removed. | |
3642 | The braces here seem to be a gratuitous change? | |
sys/dev/pci/vga_pci.c | ||
168 ↗ | (On Diff #11860) | ~0 instead of RM_MAX_END? |
sys/kern/subr_rman.c | ||
478 | I think this should be RMAN_MAX_END, not BUS_SPACE_MAXADDR. BUS_SPACE_MAXADDR should (IMO) only be used by consumers to set the bounds on various rmans. The rman code itself should only operate on opaque rman_res_t values. | |
497 | Here as well. | |
sys/mips/mips/nexus.c | ||
391 | As with arm, I think the proper place to fix this is when initializing the rman and/or the regions it covers, not to check here. | |
sys/powerpc/powerpc/nexus.c | ||
196 | This looks like another place where the rman needs to be initialized properly to not cover addresses above BUS_SPACE_MAXADDR. | |
sys/sparc64/sparc64/nexus.c | ||
431 ↗ | (On Diff #11860) | Fix the rman init. :) |
sys/sys/types.h | ||
253 ↗ | (On Diff #11860) | I would do it as a package deal. I don't know if libdevinfo is used in ports at all. Hmm, the sysctls that back it will bail if the version is bumped, but they just fail outright. This means you could do a library bump of libdevinfo (assuming it doesn't have versioned symbols) and the result is that old programs on a new kernel will just not work. The interface to the kernel doesn't seem to (easily) permit adding backwards compat either. I think the KBI change in HEAD means that this isn't going into a stable branch anyway, but I don't know if we care about old programs (or 10.x jails) using libdevinfo. Maybe do a ports exp-run with a hack patch to unhook devinfo/libdevinfo from the build just to hide libdevinfo and see how many ports use it? |
sys/x86/x86/nexus.c | ||
452 ↗ | (On Diff #11860) | Again, please fix this when the rman is inited, not as a check here. |
sys/x86/x86/nexus.c | ||
---|---|---|
591 ↗ | (On Diff #11860) | Not sure what this is for, but if it is to fix ram0, then please put the changes back in. In ram_attach you should probably change the first one to still DTRT when booting a non-PAE kernel on a machine with RAM above 4G by having it do this: #ifdef __i386__ && !defined(PAE) /* * Address ranges above 4GB can't be accessed with * standard page tables. */ if (smap->base > BUS_SPACE_MAXADDR) continue; #endif However, for the non-SMAP case the check can be removed as it only covers addresses that are valid in a PAE kernel but just don't fit in a long. |
I was not attempting to support churn; I thought there was a comment that a version was introducing churn by changing from (~0ul) to (~0), but only read the notification mail and not the full change.
sys/dev/arcmsr/arcmsr.c | ||
---|---|---|
4326 ↗ | (On Diff #11860) | Deferring this. |
sys/dev/mii/mii.c | ||
404 ↗ | (On Diff #11860) | Thought I removed these from my last diff. Guess I hadn't cleaned up everything. |
sys/dev/pci/pci.c | ||
3642 | Leftover from other debugging. | |
sys/powerpc/powerpc/nexus.c | ||
196 | This is actually redundant, since the SYS_RES_MEMORY rman only manages 0->BUS_SPACE_MAXADDR (see sys/dev/ofw/ofwbus.c). | |
sys/x86/x86/nexus.c | ||
591 ↗ | (On Diff #11860) | Since I don't have a machine to test with, I removed these changes from my diff until someone is able to test. Better to err on the side of not breaking. |
Address jhb's comments, and add libdevinfo/devinfo to the mix.
arc diff --update' and svn diff ... -U99999' both failed to upload, with "413 Request Entity Too Large". Apparently phabricator doesn't like 6MB diffs, so I had to settle on 50 lines of context, and a 1.2MB diff.
sys/dev/pci/pci.c | ||
---|---|---|
3009 | This is still not done? | |
sys/x86/x86/nexus.c | ||
591 ↗ | (On Diff #12001) | Just boot a PAE i386 kernel in a bhyve VM. PAE works fine in bhyve. You can create a bhyve VM with 4G of RAM and boot a PAE kernel and check that the resource ranges for ram0 include all of RAM via devinfo -r. This is one of the very few places that need a larger rman type. |
Only a few nits still.
However, reviewing this I think it would be best to split this up a bit when committing. For one, it would be nice to allow the new API to work in older stable branches to make MFCs of future driver changes easier. I also think that some of the real fixes (such as ata using 'end' instead of 'count') should be separate commits. I can help with doing this.
I would suggest doing an initial commit that maps __rman_res_t to unsigned long and includes the sweep changing all the variables types and printf modifiers. This is a change that can be merged to older branches to improve API compatibility.
Next, I would suggest committing the change to convert 0ul and ~0ul to just 0 and ~0.
Once those are done the remaining diff should be much more managable and we can see if there are any other fixes (like ata) that should be committed separately before throwing the switch.
This approach will also make it easier for people doing testing on other platforms as the functional change to revert if something breaks is smaller.
sys/arm/arm/nexus.c | ||
---|---|---|
165 ↗ | (On Diff #12118) | This rman_manage_region needs to be updated to use BUS_SPACE_MAXADDR or it will fail. It is actually more important for rman_manage_region() to be correct as that determines the "available" address space. |
348 ↗ | (On Diff #12118) | This check is still here? I think it should be unnecessary after the nexus changes. |
sys/arm64/arm64/nexus.c | ||
314 ↗ | (On Diff #12118) | The rman should have BUS_SPACE_MAXADDR set for the end of its region in nexus_attach() and this check should be removed. |
Reduce the diff to printf changes and flipping the switch.
Passes tinderbox. I run a subset on my P5020 board, too.
The only objection I have to that is cluttering up the printf()s with (uintmax_t) casts, or PRI* format specifiers (just makes things uglier). I also thought about making it a potentially-MD type, and casting all the printf()s, but that would make resource_list_print_type() calls rather ugly, unless I force it to always print as a specific type.
It's a tradeoff either way.
I have only tested parts of this on arm* and amd64 but I am in general favour as it allows me to boot certain systems in a simulator with a lot less hakery and trouble (which means it boots rather than crashes) ;-)