pci_user: Report NUMA domain A PCI device's NUMA domain is now accessible via the pd_numa_domain member of struct pci_conf when using the PCIOCGETCONF ioctl. A new ioctl number has been assigned to PCIOCGETCONF to preserve compatibility with binaries compiled on FreeBSD versions 7 through 14. Such binaries can continue to use the PCIOCGETCONF ioctl number that they were compiled with and experience no ABI repercussions.
Details
- Reviewers
markj imp - Commits
- rG9404c479946c: pci_user: Report NUMA domain
Compiled the following program on FreeBSD 13, 14, 15:
https://reviews.freebsd.org/P644
The FreeBSD 13 and 14 versions had the pd_numa_domain bit commented out, since it is unavailable.
[root@freebsd ~/tests/pci]$ file pci1* pci13: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 13.2, FreeBSD-style, with debug_info, not stripped pci13.out: ASCII text pci14: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 14.1, FreeBSD-style, with debug_info, not stripped pci14.out: ASCII text pci15: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 15.0 (1500018), FreeBSD-style, with debug_info, not stripped pci15.out: ASCII text [root@freebsd ~/tests/pci]$ ./pci13 > pci13.out [root@freebsd ~/tests/pci]$ ./pci14 > pci14.out [root@freebsd ~/tests/pci]$ ./pci15 > pci15.out [root@freebsd ~/tests/pci]$ diff pci13.out pci15.out 13a14,15 > NUMA node: 0 > 26a29,30 > NUMA node: 0 > 39a44,45 > NUMA node: 0 > 52a59,60 > NUMA node: 0 > 65a74,75 > NUMA node: 0 > [root@freebsd ~/tests/pci]$ diff pci14.out pci15.out 13a14,15 > NUMA node: 0 > 26a29,30 > NUMA node: 0 > 39a44,45 > NUMA node: 0 > 52a59,60 > NUMA node: 0 > 65a74,75 > NUMA node: 0 >
The above output shows all binaries running successfully on FreeBSD 15. There was no difference between the 13,14 output and the 15 output besides numa domain.
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
These changes aren't backward compatible. You likely need to keep the old struct to reply to the old ioctl...
And you can't change the current old structure. Old binaries will have buffer overflows
Do not update pci_conf_old and pci_conf_old32 structures. These
should stay static for compatibility with binaries that were
compiled to use the PCIOCGETCONF_OLD and PCIOCGETCONF_OLD32 ioctls.
Thanks for removing this from all the old versions...
However, we still have some compatibility issue. I think we need to rename all the _old names to _freebsd7 to make it comply with more-modern conventions. This can likely be a separate commit.
Then, we need to take the current structures and make _freebsd14 compat versions for them. Since the layout is the same, we'd likely be able to do this by just not setting pd_numa_domain if the IOCTL (which encodes the lenght) is the _freebsd14 version I'd like you to create. It sounds like a lot, but I think most of it is cut and paste and isn't that bad since it's an extra case statement and an extra if for the normal and compat32 branches.
So something like https://reviews.freebsd.org/D45867 for the rename (I can commit this if you'd like)
And something like https://reviews.freebsd.org/D45868 for the compat stuff... It's a little different than I originally thought, so there will need to be a new ioctl to cover this case since the sizes stay the same, we have to change the number so we don't overflow the receiver's conf buffer if it's run against a new kernel. I suspect the new ioctl# and the freebsd14 ioctl parts from that differential belong here, and then you can take mine for compat, make sure it works and then commit it maybe (or if you want me to, let me know)
| sys/sys/pciio.h | ||
|---|---|---|
| 80 | Does it make sense to add some spare fields here, so that you can add additional fields later without needing compat shims? | |
| sys/sys/pciio.h | ||
|---|---|---|
| 80 |
I was thinking the same thing - this is a lot of compat cruft for a single field. It would be unfortunate if we needed to do it all again for another field in the next major version. I do not know how this should be handled. Maybe some simple char padding would do? | |
| sys/sys/pciio.h | ||
|---|---|---|
| 80 | For most 'compat' items, the length is just appended and it's just an extra line or two. | |
Use memcpy(3) to copy pci_conf data into compat structures instead of assigning
members one by one.
I think this is ok (though please add the padding), but let's give Warner another chance to look since he knows this code better than me.
| sys/sys/pciio.h | ||
|---|---|---|
| 80 | To add padding for next time, I'd just include a uint64_t pd_spare[8] field at the end. | |
So I have two things to say.
As is, this is good. The other is about extra fields: how do clients know they are valid?
| sys/sys/pciio.h | ||
|---|---|---|
| 80 | So the problem with doing this is 'how do clients know they can use these values'. While it nicely solves the size of data problem, we'd be left with an interpretation of data problem, unless the new data published defaults to 0 or 0 means unreported (like if it were a nvme namespace, 0 is illegal, so one could infer it was not reported). CAM does weird bitmasks (that are non-uniform for historical reasons). Others have a length of actual data reported. Without some care, we don't have this here. We'd need to add a field before this one that's the length of additional valid data. That may keep us from writing yet more compat routines. The cost, though, is we still might if pcisel every changes. This suggests that the 'domain' is better off as a fixed datatype rather than an int. uint32_t. All the other fields are this way, and it means the 32-bit and 64-bit structures should be the same size. | |