Details

Reviewers

kib
jhb
pauamma_gundo.com

Group Reviewers

Linux Emulation
manpages

Commits

rGf35093f8d6d8: Use Linux semantics for the thread affinity syscalls.

Summary

Linux has more tolerant checks of the user supplied cpuset_t's.

Minimum cpuset_t size that the Linux kernel permits in case of
getaffinity() is a number of active cpus / NBBY, the maximum
size is not limited.
For setaffinity(), Linux does not limit the size of user-provided
cpuset_t, internally using only a part of it, no larger than the
size of the kernel cpuset_t.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

dchagin created this revision.Apr 8 2022, 7:58 PM

Herald added a subscriber: imp. · View Herald TranscriptApr 8 2022, 7:58 PM

dchagin requested review of this revision.Apr 8 2022, 7:58 PM

Harbormaster completed remote builds in B45083: Diff 104786.Apr 8 2022, 7:58 PM

dchagin added a reviewer: Linux Emulation.Apr 8 2022, 8:01 PM

dchagin changed the repository for this revision from rS FreeBSD src repository - subversion to rG FreeBSD src repository.

dchagin added reviewers: kib, jhb.Apr 12 2022, 10:22 AM

I'm not a big fan of the _raw suffix. If I'm reading the diff correctly, the intent of this change is to split out a variant that accepts a kernel pointer rather than a user pointer, with the malloc + copyin/copyout done in the user wrapper. In CheriBSD, @brooks has added user_* variants of several kern_* functions that are a similar split where the kern_* version only deals with kernel pointers. In that parlance, the new functions would retain the existing kern_* name, but the old functions would be renamed to user_* instead. I would perhaps suggest that for the names. I would also clarify in the commit log that the reason for the split is to permit a kernel pointer to the mask.

jhb added a child revision: D34850: linux(4); Adopt affinity syscalls to the recent kernel changes..Apr 12 2022, 10:29 PM

I agree with @jhb on the naming. I've introduced a couple following this convention in R10:28f047188492c8f2ddca66f162fd2ac9bdc170a6 and I've got another ~50 in my CheriBSD tree. That probably does require squashing this commit with D34850.

rework as jhb@ suggested.

Herald added a reviewer: manpages. · View Herald TranscriptApr 20 2022, 3:56 PM

Harbormaster completed remote builds in B45251: Diff 105209.Apr 20 2022, 3:56 PM

dchagin retitled this revision from Split kern_cpuset_[*]affinity on two counterparts for Linux emulation. to Use Linux semantics for the thread affinity syscalls..Apr 20 2022, 4:04 PM

dchagin edited the summary of this revision. (Show Details)

pauamma_gundo.com requested changes to this revision.Apr 22 2022, 12:09 PM

pauamma_gundo.com added a subscriber: pauamma_gundo.com.

pauamma_gundo.com added inline comments.

lib/libc/sys/cpuset_getaffinity.2
76–77	I don't know how to interpret the removal of The kernel tolerates large sets as long as all CPUs specified in the set exist. Without that text, the paragraph could mean either: The kernel no longer tolerates large sets at all. The kernel now tolerate large sets even if some CPUs specified don't exist. Since these are just about opposite implications, please specify which you mean (or if something else, what you mean).
share/man/man3/pthread_attr_affinity_np.3
54–57	Same question as for cpuset_getaffinity(2).

This revision now requires changes to proceed.Apr 22 2022, 12:09 PM

jhb added inline comments.Apr 22 2022, 9:39 PM

lib/libc/sys/cpuset_getaffinity.2
76–77	Perhaps something like: If the user-supplied set is not large enough to fit all of the matching CPUs, .Fn cpusetgetaffinity fails with .Er ERANGE . (I do think "matching CPUs" might be slightly better than "active CPUs" here)
sys/kern/kern_cpuset.c
2019	I think you still want this check? That is, if userland specifies a non-existent CPU we should reject it with an error?

pauamma_gundo.com added inline comments.Apr 23 2022, 9:27 PM

lib/libc/sys/cpuset_getaffinity.2
76–77	That would probably work overall, if you s/cpusetgetaffinity/cpuset_getaffinity/. What do "matching" and or "active" mean here, though? I took "active" to mean "an existing CPU the scheduling object - process, userland thread, interrupt handler, or whatever - is willing and able to run on, but now that you introduced a different term, I no longer know.

kib added inline comments.Apr 24 2022, 9:56 PM

sys/kern/kern_cpuset.c
1898–1899	Note that cpusetsize * NBBY can overflow. In this case this seems to be innocent due to the later cap of the size with `min(cpusetsize, sizeof(cpuset_t))` but it misses to return ERANGE in some situations.

dchagin added inline comments.Apr 25 2022, 4:45 PM

lib/libc/sys/cpuset_getaffinity.2
76–77	'matching CPUs' is a bit confusing to me also, may be 'existing'?

Reworked, also fixed libc affinity methods and manpages.

Add tests.

Harbormaster completed remote builds in B45377: Diff 105431.Apr 26 2022, 2:58 PM

dchagin mentioned this in D34850: linux(4); Adopt affinity syscalls to the recent kernel changes..Apr 26 2022, 2:59 PM

Remove assert

Harbormaster completed remote builds in B45383: Diff 105437.Apr 26 2022, 4:02 PM

jhb added inline comments.Apr 26 2022, 9:42 PM

lib/libc/sys/cpuset_getaffinity.2
76–77	I guess I mean matching in terms of the CPUs that match the qualifiers (CPU_WHICH_*, a specific pid or tid, etc.). The difference is suppose you had a system with enough CPUs that you needed a set of two words to express all of them, but you were to query the set for a thread that was bound to just CPU 0. The set for that thread can be described by a smaller set containing just the single bit 0 (since only CPU 0 "matches" the query of "set of CPUs for thread X") vs if the size check is always done against what is required to hold the the set of all CPUs (so always requiring two words in this case). To me, active implies "an active CPU in the system" and thus means you always require the two words for the case above.
sys/kern/kern_cpuset.c
1898–1899	So this also fails in the case I mention that you are querying the per-thread mask that fits in a smaller set than the thing being returned. I think instead you want to not do a check here but instead wait until just before the copyout() and then check to see if the highest set bit is too large to fit in the user's mask and fail with ERANGE if so.
2034	This should be using mp_maxid, not smp_cpus to account for sparse CPU IDs. However, all this really needs to verify is what the old check was doing. Other code will catch if we have bits set that can fit into a cpuset_t that are > mp_maxid. What this check needs to verify instead is that userland hasn't set bits that don't fit into the kernel's cpuset_t. In that case, the existing code that used `sizeof(cpuset_t)` directly in place of `factsize` is correct. However, since the malloc above now (correctly) is capped to only `cpuset_t`, you can't check the extra bits directly from mask. Instead, you need to be fetching the words from userland explicitly via fueword() or the like, so something like: if (cpusetsize > sizeof(cpuset_t)) { char end; char cp; int val; end = cp = (char *)&maskp->__bits; end += cpusetsize; cp += sizeof(cpuset_t); while (cp != end) { val = fubyte(cp); if (val == -1) { error = EFAULT; goto out; } else if (val != 0) { error = EINVAL; goto out; } cp++; } } (It may be worth trying to fetch words first if possible rather than fetching bytes by adding a second loop before the first one that uses `while (cp + sizeof(long) < end)` and using `fueword` instead of `fubyte`.)
2035	This can overflow. `loadsize` needs to be `max`, not `min` so that it is capped at `sizeof(cpuset_t)`.
tests/sys/kern/sched_affinity.c
36 ↗	(On Diff #105437)	This test assumes that CPU IDs are dense which we do not guarantee (hence the use of mp_maxid in the kernel everywhere rather than mp_ncpus). Perhaps more correct would be to write a helper routine that fetches the root cpuset and returns the highest bit set instead (so equivalent of mp_maxid) and use that in place of _SC_NPROCESSORS_CONF in these tests.

Almost there, for the manual pages.

lib/libc/sys/cpuset_getaffinity.2
74	"all CPUs" or "all of the CPUs" (I prefer the former).
75	Maybe s/on/in/ too? Not 100% sure.
80–87	Check I didn't change your intended meaning.
share/man/man3/pthread_attr_affinity_np.3
54–57	"all CPUs" or "all of the CPUs" (I prefer the former)
55	Maybe s/on/in/ here too? (See above.)
60–61	Again, checks it means what you intended.

dchagin added inline comments.Apr 27 2022, 10:14 AM

tests/sys/kern/sched_affinity.c
36 ↗	(On Diff #105437)	whoops, I missed that (mp_maxid), it turns out that the getaffinity() is wrong too. I'll try to fix it all, thank you!

ok, first of all, sorry, I really missed that CPU ids are not dense. So, code
has been reworked according to jhb's suggestions.

Manpages rephrased accordingly.

Also, for getaffinity() I added a code which zero user space mask in case when
cpusetsize > copyout size.

Thanks!

Harbormaster completed remote builds in B45417: Diff 105532.Apr 28 2022, 5:51 PM

dchagin marked 5 inline comments as done.Apr 28 2022, 6:01 PM

dchagin added inline comments.

sys/kern/kern_cpuset.c
2034	Yep, this code can be optimized. However, the API does not require that cpusetsize should be a multiple of a word. Therefore, a more sophisticated method is needed. I'll try))
2035	we copyin at least cpusetsize, maximum sizeof(cpuset_t) bytes, it should be min().

dchagin marked 4 inline comments as done.Apr 28 2022, 6:05 PM

dchagin added inline comments.

lib/libc/sys/cpuset_getaffinity.2
76–77	I propose to agree with John's 'definition', it fully describes getaffinity behavior.

whoops, forgot the tests

Harbormaster completed remote builds in B45419: Diff 105535.Apr 28 2022, 6:09 PM

pauamma_gundo.com added inline comments.Apr 28 2022, 9:28 PM

lib/libc/sys/cpuset_getaffinity.2
81	Still would go for the definite article here.
share/man/man3/pthread_attr_affinity_np.3
61	And here.

man pages fixed, thank you, Pau

Harbormaster completed remote builds in B45425: Diff 105544.Apr 29 2022, 10:07 AM

dchagin marked 3 inline comments as done.Apr 29 2022, 10:07 AM

pauamma_gundo.com accepted this revision.Apr 29 2022, 9:07 PM

This revision is now accepted and ready to land.Apr 29 2022, 9:07 PM

Improved Linux compatibility in the libc sched_setaffinity() method and
Linuxulator itself. Zero the high bits if set, as unlike FreeBSD, Linux
ignores them. To do this, the cpuset_setaffinity() had to be splited into
two counterparts.

Manpages does not touched at all.

This revision now requires review to proceed.May 4 2022, 3:16 PM

Harbormaster completed remote builds in B45463: Diff 105675.May 4 2022, 3:16 PM

dchagin mentioned this in D34982: kdump: Decode cpuset_t..May 4 2022, 3:17 PM

dchagin added a child revision: D34982: kdump: Decode cpuset_t..May 4 2022, 3:19 PM

fixed sz type in sched_setaffinity

Harbormaster completed remote builds in B45466: Diff 105678.May 4 2022, 5:23 PM

hi, ping, any objection to commit this?

Can't speak for others, but barring API changes the manual pages haven(t caught up with, they still LGTM.

This revision was not accepted when it landed; it landed in state Needs Review.May 11 2022, 7:41 AM

Closed by commit rGf35093f8d6d8: Use Linux semantics for the thread affinity syscalls. (authored by dchagin). · Explain Why

This revision was automatically updated to reflect the committed changes.

dchagin added a commit: rGf35093f8d6d8: Use Linux semantics for the thread affinity syscalls..

dchagin mentioned this in rG72bc1e6806cc: cpuset: Byte swap cpuset for compat32 on big endian architectures.Jun 17 2022, 7:45 PM

@jhb @dchagin I'm currently porting https://crates.io/crates/cpu-affinity to FreeBSD (right now its FreeBSD support consists of noops) and I stumbled over something that might be by design:

Linux semantics are to use thread-ids, while this wrapper uses process IDs - this showed up in unit tests, which are running in two separate threads:

Thread 1:

Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu
Set CPU mask for current thread

Thread 2:

Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu
Set CPU mask for current thread

When running single-threaded, this always fails (for obvious reasons).
When running multi-threaded, it should always succeed, but it doesn't (it does sometimes though, depending on the order of execution).

When changing the implementation to use CPU_WHICH_TID, this works as expected (as it's operating on the current thread).
I have no idea which side-effects such a change would have though, hence my question before trying to open a differential without really understanding the consequences.

In D34849#857607, @grembo wrote:

@jhb @dchagin I'm currently porting https://crates.io/crates/cpu-affinity to FreeBSD (right now its FreeBSD support consists of noops) and I stumbled over something that might be by design:

hmm, I'll take a look this long weekend.

In D34849#857607, @grembo wrote:

@jhb @dchagin I'm currently porting https://crates.io/crates/cpu-affinity to FreeBSD (right now its FreeBSD support consists of noops) and I stumbled over something that might be by design:

Linux semantics are to use thread-ids, while this wrapper uses process IDs - this showed up in unit tests, which are running in two separate threads:

Thread 1:

Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu

Set CPU mask for current thread

Thread 2:

Get CPU set ids based on current thread's mask (pid = 0), make sure it equals ncpu

Set CPU mask for current thread

When running single-threaded, this always fails (for obvious reasons).
When running multi-threaded, it should always succeed, but it doesn't (it does sometimes though, depending on the order of execution).

When changing the implementation to use CPU_WHICH_TID, this works as expected (as it's operating on the current thread).
I have no idea which side-effects such a change would have though, hence my question before trying to open a differential without really understanding the consequences.

fix is in main and stable/13, could you please test now?

Use Linux semantics for the thread affinity syscalls.
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 105851

lib/libc/gen/sched_getaffinity.c

lib/libc/gen/sched_setaffinity.c

lib/libc/sys/cpuset_getaffinity.2

share/man/man3/pthread_attr_affinity_np.3

sys/compat/freebsd32/freebsd32_misc.c

sys/compat/linux/linux_misc.c

sys/kern/kern_cpuset.c

sys/sys/syscallsubr.h

Use Linux semantics for the thread affinity syscalls.ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 105851

lib/libc/gen/sched_getaffinity.c

lib/libc/gen/sched_setaffinity.c

lib/libc/sys/cpuset_getaffinity.2

share/man/man3/pthread_attr_affinity_np.3

sys/compat/freebsd32/freebsd32_misc.c

sys/compat/linux/linux_misc.c

sys/kern/kern_cpuset.c

sys/sys/syscallsubr.h

Use Linux semantics for the thread affinity syscalls.
ClosedPublic
Actions

Revision Contents
Changeset List