Page MenuHomeFreeBSD

pci_user: Report NUMA domain
ClosedPublic

Authored by jfree on Mar 10 2024, 3:00 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 22, 10:09 AM
Unknown Object (File)
Mon, Oct 20, 2:46 AM
Unknown Object (File)
Sun, Oct 19, 8:08 AM
Unknown Object (File)
Sat, Oct 11, 6:34 AM
Unknown Object (File)
Sat, Oct 11, 6:34 AM
Unknown Object (File)
Sat, Oct 11, 6:34 AM
Unknown Object (File)
Sat, Oct 11, 6:34 AM
Unknown Object (File)
Sat, Oct 11, 6:34 AM

Details

Summary
pci_user: Report NUMA domain

A PCI device's NUMA domain is now accessible via the pd_numa_domain
member of struct pci_conf when using the PCIOCGETCONF ioctl.

A new ioctl number has been assigned to PCIOCGETCONF to preserve
compatibility with binaries compiled on FreeBSD versions 7 through 14.
Such binaries can continue to use the PCIOCGETCONF ioctl number
that they were compiled with and experience no ABI repercussions.
Test Plan

Compiled the following program on FreeBSD 13, 14, 15:
https://reviews.freebsd.org/P644

The FreeBSD 13 and 14 versions had the pd_numa_domain bit commented out, since it is unavailable.

[root@freebsd ~/tests/pci]$ file pci1*
pci13:     ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 13.2, FreeBSD-style, with debug_info, not stripped
pci13.out: ASCII text
pci14:     ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 14.1, FreeBSD-style, with debug_info, not stripped
pci14.out: ASCII text
pci15:     ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 15.0 (1500018), FreeBSD-style, with debug_info, not stripped
pci15.out: ASCII text

[root@freebsd ~/tests/pci]$ ./pci13 > pci13.out
[root@freebsd ~/tests/pci]$ ./pci14 > pci14.out
[root@freebsd ~/tests/pci]$ ./pci15 > pci15.out

[root@freebsd ~/tests/pci]$ diff pci13.out pci15.out
13a14,15
> NUMA node: 0
>
26a29,30
> NUMA node: 0
>
39a44,45
> NUMA node: 0
>
52a59,60
> NUMA node: 0
>
65a74,75
> NUMA node: 0
>
[root@freebsd ~/tests/pci]$ diff pci14.out pci15.out
13a14,15
> NUMA node: 0
>
26a29,30
> NUMA node: 0
>
39a44,45
> NUMA node: 0
>
52a59,60
> NUMA node: 0
>
65a74,75
> NUMA node: 0
>

The above output shows all binaries running successfully on FreeBSD 15. There was no difference between the 13,14 output and the 15 output besides numa domain.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jfree requested review of this revision.Mar 10 2024, 3:00 AM

Don't forget to update .Dd when content of manual page changed. :-)

Update manual page .Dd date

imp requested changes to this revision.Mar 17 2024, 10:49 PM

These changes aren't backward compatible. You likely need to keep the old struct to reply to the old ioctl...

And you can't change the current old structure. Old binaries will have buffer overflows

This revision now requires changes to proceed.Mar 17 2024, 10:49 PM

Do not update pci_conf_old and pci_conf_old32 structures. These
should stay static for compatibility with binaries that were
compiled to use the PCIOCGETCONF_OLD and PCIOCGETCONF_OLD32 ioctls.

Thanks for removing this from all the old versions...

However, we still have some compatibility issue. I think we need to rename all the _old names to _freebsd7 to make it comply with more-modern conventions. This can likely be a separate commit.

Then, we need to take the current structures and make _freebsd14 compat versions for them. Since the layout is the same, we'd likely be able to do this by just not setting pd_numa_domain if the IOCTL (which encodes the lenght) is the _freebsd14 version I'd like you to create. It sounds like a lot, but I think most of it is cut and paste and isn't that bad since it's an extra case statement and an extra if for the normal and compat32 branches.

So something like https://reviews.freebsd.org/D45867 for the rename (I can commit this if you'd like)

And something like https://reviews.freebsd.org/D45868 for the compat stuff... It's a little different than I originally thought, so there will need to be a new ioctl to cover this case since the sizes stay the same, we have to change the number so we don't overflow the receiver's conf buffer if it's run against a new kernel. I suspect the new ioctl# and the freebsd14 ioctl parts from that differential belong here, and then you can take mine for compat, make sure it works and then commit it maybe (or if you want me to, let me know)

jfree retitled this revision from PCIOCGETCONF: Report NUMA domain to pci_user: Report NUMA domain.Jul 29 2024, 10:44 PM
jfree edited the summary of this revision. (Show Details)
jfree edited the test plan for this revision. (Show Details)

Address compatibility concerns pointed out by Warner

sys/sys/pciio.h
80

Does it make sense to add some spare fields here, so that you can add additional fields later without needing compat shims?

sys/sys/pciio.h
80

Does it make sense to add some spare fields here, so that you can add additional fields later without needing compat shims?

I was thinking the same thing - this is a lot of compat cruft for a single field. It would be unfortunate if we needed to do it all again for another field in the next major version. I do not know how this should be handled. Maybe some simple char padding would do?

sys/sys/pciio.h
80

For most 'compat' items, the length is just appended and it's just an extra line or two.
The older structures were more of a re-arrangement.
So I'm not sure that the copying element by element we do for the FreeBSD14 section is strictly necessary. That's why one normally doesn't do extra fields... We just selectively copy the first bits for older ioctl numbers.

Also... just realized I likely wrote the field by f8eld copy... let me try again

Use memcpy(3) to copy pci_conf data into compat structures instead of assigning
members one by one.

jfree edited the test plan for this revision. (Show Details)

I think this is ok (though please add the padding), but let's give Warner another chance to look since he knows this code better than me.

sys/sys/pciio.h
80

To add padding for next time, I'd just include a uint64_t pd_spare[8] field at the end.

So I have two things to say.
As is, this is good. The other is about extra fields: how do clients know they are valid?

sys/sys/pciio.h
80

So the problem with doing this is 'how do clients know they can use these values'. While it nicely solves the size of data problem, we'd be left with an interpretation of data problem, unless the new data published defaults to 0 or 0 means unreported (like if it were a nvme namespace, 0 is illegal, so one could infer it was not reported). CAM does weird bitmasks (that are non-uniform for historical reasons). Others have a length of actual data reported. Without some care, we don't have this here. We'd need to add a field before this one that's the length of additional valid data. That may keep us from writing yet more compat routines. The cost, though, is we still might if pcisel every changes.

This suggests that the 'domain' is better off as a fixed datatype rather than an int. uint32_t. All the other fields are this way, and it means the 32-bit and 64-bit structures should be the same size.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 4 2025, 2:46 AM
This revision was automatically updated to reflect the committed changes.