Page MenuHomeFreeBSD

Use Linux semantics for the thread affinity syscalls.
ClosedPublic

Authored by dchagin on Apr 8 2022, 7:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Apr 22, 6:54 AM
Unknown Object (File)
Sun, Apr 21, 8:47 PM
Unknown Object (File)
Sun, Apr 21, 8:47 PM
Unknown Object (File)
Sun, Apr 21, 8:47 PM
Unknown Object (File)
Sun, Apr 21, 8:47 PM
Unknown Object (File)
Sun, Apr 21, 8:47 PM
Unknown Object (File)
Sun, Apr 21, 8:47 PM
Unknown Object (File)
Sun, Apr 21, 8:46 PM

Details

Summary

Linux has more tolerant checks of the user supplied cpuset_t's.

Minimum cpuset_t size that the Linux kernel permits in case of
getaffinity() is a number of active cpus / NBBY, the maximum
size is not limited.
For setaffinity(), Linux does not limit the size of user-provided
cpuset_t, internally using only a part of it, no larger than the
size of the kernel cpuset_t.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I'm not a big fan of the _raw suffix. If I'm reading the diff correctly, the intent of this change is to split out a variant that accepts a kernel pointer rather than a user pointer, with the malloc + copyin/copyout done in the user wrapper. In CheriBSD, @brooks has added user_* variants of several kern_* functions that are a similar split where the kern_* version only deals with kernel pointers. In that parlance, the new functions would retain the existing kern_* name, but the old functions would be renamed to user_* instead. I would perhaps suggest that for the names. I would also clarify in the commit log that the reason for the split is to permit a kernel pointer to the mask.

I agree with @jhb on the naming. I've introduced a couple following this convention in R10:28f047188492c8f2ddca66f162fd2ac9bdc170a6 and I've got another ~50 in my CheriBSD tree. That probably does require squashing this commit with D34850.

dchagin retitled this revision from Split kern_cpuset_[*]affinity on two counterparts for Linux emulation. to Use Linux semantics for the thread affinity syscalls..Apr 20 2022, 4:04 PM
dchagin edited the summary of this revision. (Show Details)
pauamma_gundo.com added inline comments.
lib/libc/sys/cpuset_getaffinity.2
76–77

I don't know how to interpret the removal of

The kernel tolerates large sets as long as all CPUs specified
in the set exist.

Without that text, the paragraph could mean either:

  • The kernel no longer tolerates large sets at all.
  • The kernel now tolerate large sets even if some CPUs specified don't exist.

Since these are just about opposite implications, please specify which you mean (or if something else, what you mean).

share/man/man3/pthread_attr_affinity_np.3
54–57

Same question as for cpuset_getaffinity(2).

This revision now requires changes to proceed.Apr 22 2022, 12:09 PM
lib/libc/sys/cpuset_getaffinity.2
76–77

Perhaps something like:

If the user-supplied set is not large enough to fit all of the matching CPUs,
.Fn cpusetgetaffinity
fails with
.Er ERANGE .

(I do think "matching CPUs" might be slightly better than "active CPUs" here)

sys/kern/kern_cpuset.c
2019

I think you still want this check? That is, if userland specifies a non-existent CPU we should reject it with an error?

lib/libc/sys/cpuset_getaffinity.2
76–77

That would probably work overall, if you s/cpusetgetaffinity/cpuset_getaffinity/. What do "matching" and or "active" mean here, though? I took "active" to mean "an existing CPU the scheduling object - process, userland thread, interrupt handler, or whatever - is willing and able to run on, but now that you introduced a different term, I no longer know.

sys/kern/kern_cpuset.c
1898–1899

Note that cpusetsize * NBBY can overflow. In this case this seems to be innocent due to the later cap of the size with min(cpusetsize, sizeof(cpuset_t)) but it misses to return ERANGE in some situations.

lib/libc/sys/cpuset_getaffinity.2
76–77

'matching CPUs' is a bit confusing to me also, may be 'existing'?

Reworked, also fixed libc affinity methods and manpages.

Add tests.

lib/libc/sys/cpuset_getaffinity.2
76–77

I guess I mean matching in terms of the CPUs that match the qualifiers (CPU_WHICH_*, a specific pid or tid, etc.). The difference is suppose you had a system with enough CPUs that you needed a set of two words to express all of them, but you were to query the set for a thread that was bound to just CPU 0. The set for that thread can be described by a smaller set containing just the single bit 0 (since only CPU 0 "matches" the query of "set of CPUs for thread X") vs if the size check is always done against what is required to hold the the set of all CPUs (so always requiring two words in this case).

To me, active implies "an active CPU in the system" and thus means you always require the two words for the case above.

sys/kern/kern_cpuset.c
1898–1899

So this also fails in the case I mention that you are querying the per-thread mask that fits in a smaller set than the thing being returned. I think instead you want to not do a check here but instead wait until just before the copyout() and then check to see if the highest set bit is too large to fit in the user's mask and fail with ERANGE if so.

2034

This should be using mp_maxid, not smp_cpus to account for sparse CPU IDs. However, all this really needs to verify is what the old check was doing. Other code will catch if we have bits set that can fit into a cpuset_t that are > mp_maxid. What this check needs to verify instead is that userland hasn't set bits that don't fit into the kernel's cpuset_t. In that case, the existing code that used sizeof(cpuset_t) directly in place of factsize is correct. However, since the malloc above now (correctly) is capped to only cpuset_t, you can't check the extra bits directly from mask. Instead, you need to be fetching the words from userland explicitly via fueword() or the like, so something like:

if (cpusetsize > sizeof(cpuset_t)) {
    char *end;
    char *cp;
    int val;

    end = cp = (char *)&maskp->__bits;
    end += cpusetsize;
    cp += sizeof(cpuset_t);
    while (cp  != end) {
        val = fubyte(cp);
        if (val == -1) {
            error = EFAULT;
            goto out;
        } else if (val != 0) {
            error = EINVAL;
            goto out;
        }
        cp++;
    }
}

(It may be worth trying to fetch words first if possible rather than fetching bytes by adding a second loop before the first one that uses while (cp + sizeof(long) < end) and using fueword instead of fubyte.)

2035

This can overflow. loadsize needs to be max, not min so that it is capped at sizeof(cpuset_t).

tests/sys/kern/sched_affinity.c
36 ↗(On Diff #105437)

This test assumes that CPU IDs are dense which we do not guarantee (hence the use of mp_maxid in the kernel everywhere rather than mp_ncpus). Perhaps more correct would be to write a helper routine that fetches the root cpuset and returns the highest bit set instead (so equivalent of mp_maxid) and use that in place of _SC_NPROCESSORS_CONF in these tests.

Almost there, for the manual pages.

lib/libc/sys/cpuset_getaffinity.2
74

"all CPUs" or "all of the CPUs" (I prefer the former).

75

Maybe s/on/in/ too? Not 100% sure.

80–87

Check I didn't change your intended meaning.

share/man/man3/pthread_attr_affinity_np.3
54–57

"all CPUs" or "all of the CPUs" (I prefer the former)

55

Maybe s/on/in/ here too? (See above.)

60–61

Again, checks it means what you intended.

tests/sys/kern/sched_affinity.c
36 ↗(On Diff #105437)

whoops, I missed that (mp_maxid),
it turns out that the getaffinity() is wrong too. I'll try to fix it all, thank you!

ok, first of all, sorry, I really missed that CPU ids are not dense. So, code
has been reworked according to jhb's suggestions.

Manpages rephrased accordingly.

Also, for getaffinity() I added a code which zero user space mask in case when
cpusetsize > copyout size.

Thanks!

dchagin added inline comments.
sys/kern/kern_cpuset.c
2034

Yep, this code can be optimized. However, the API does not require that cpusetsize should be a multiple of a word.
Therefore, a more sophisticated method is needed. I'll try))

2035

we copyin at least cpusetsize, maximum sizeof(cpuset_t) bytes, it should be min().

dchagin added inline comments.
lib/libc/sys/cpuset_getaffinity.2
76–77

I propose to agree with John's 'definition', it fully describes getaffinity behavior.

lib/libc/sys/cpuset_getaffinity.2
81

Still would go for the definite article here.

share/man/man3/pthread_attr_affinity_np.3
61

And here.

man pages fixed, thank you, Pau

This revision is now accepted and ready to land.Apr 29 2022, 9:07 PM

Improved Linux compatibility in the libc sched_setaffinity() method and
Linuxulator itself. Zero the high bits if set, as unlike FreeBSD, Linux
ignores them. To do this, the cpuset_setaffinity() had to be splited into
two counterparts.

Manpages does not touched at all.

This revision now requires review to proceed.May 4 2022, 3:16 PM

fixed sz type in sched_setaffinity

hi, ping, any objection to commit this?

Can't speak for others, but barring API changes the manual pages haven(t caught up with, they still LGTM.

This revision was not accepted when it landed; it landed in state Needs Review.May 11 2022, 7:41 AM
This revision was automatically updated to reflect the committed changes.

@jhb @dchagin I'm currently porting https://crates.io/crates/cpu-affinity to FreeBSD (right now its FreeBSD support consists of noops) and I stumbled over something that might be by design:

Linux semantics are to use thread-ids, while this wrapper uses process IDs - this showed up in unit tests, which are running in two separate threads:

Thread 1:

  1. Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu
  2. Set CPU mask for current thread

Thread 2:

  1. Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu
  2. Set CPU mask for current thread

When running single-threaded, this always fails (for obvious reasons).
When running multi-threaded, it should always succeed, but it doesn't (it does sometimes though, depending on the order of execution).

When changing the implementation to use CPU_WHICH_TID, this works as expected (as it's operating on the current thread).
I have no idea which side-effects such a change would have though, hence my question before trying to open a differential without really understanding the consequences.

@jhb @dchagin I'm currently porting https://crates.io/crates/cpu-affinity to FreeBSD (right now its FreeBSD support consists of noops) and I stumbled over something that might be by design:

hmm, I'll take a look this long weekend.

@jhb @dchagin I'm currently porting https://crates.io/crates/cpu-affinity to FreeBSD (right now its FreeBSD support consists of noops) and I stumbled over something that might be by design:

Linux semantics are to use thread-ids, while this wrapper uses process IDs - this showed up in unit tests, which are running in two separate threads:

Thread 1:

  1. Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu
  2. Set CPU mask for current thread

Thread 2:

  1. Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu
  2. Set CPU mask for current thread

When running single-threaded, this always fails (for obvious reasons).
When running multi-threaded, it should always succeed, but it doesn't (it does sometimes though, depending on the order of execution).

When changing the implementation to use CPU_WHICH_TID, this works as expected (as it's operating on the current thread).
I have no idea which side-effects such a change would have though, hence my question before trying to open a differential without really understanding the consequences.

fix is in main and stable/13, could you please test now?