Page MenuHomeFreeBSD

Add membarrier(2)
Needs ReviewPublic

Authored by kib on Oct 7 2021, 10:31 PM.

Details

Summary
This is an attempt at clean-room implementation of the Linux'
membarrier(2) syscall.  For documentation, you would need to read
both membarrier(2) Linux man page, the comments in Linux
kernel/sched/membarrier.c implementation and possibly look at
actual uses.

Man page used as a reference https://kib.kiev.ua/kib/membarrier.pdf

Test Plan

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

kib requested review of this revision.Oct 7 2021, 10:31 PM
kib removed a reviewer: brooks.

Complete and somewhat tested amd64 implementation.
Theoretically complete (but not even compiled) arm64.

i386 bits
Add comments to arm64 and amd64 cpu_core_sync

Fill MD functions on all arches

kib added reviewers: alc, markj, jhb.
kib added a subscriber: theraven.
sys/sys/proc.h
842

Is it intentional that these have the same value? I presume that the membarrier()'s "registration" verbs exist to handle races where the initiating thread fails to observe a concurrent pm_active modification by another thread that is switching on to a CPU.

Thanks for doing this, it will make porting our work to FreeBSD easier. There are some missing man pages (both for the new pmap interfaces and for the syscall).

Do we need any kind of rate limiting on this? In particular, if a thread sits in a loop doing the global broadcast one, can it mount a denial of service attack on the whole system? I don't think there are any other system calls that do so little work on the calling thread and do an IPI.

The RSEQ variant relates to restartable sequences. These are a generalised form of the mechanism used for ARMv4 atomics, where userspace specified an alternative return address if preempted. glibc uses these for some optimistic concurrency and userspace per-CPU data structures. It would be *fantastic* to see them in FreeBSD as well. They don't make sense on systems without restartable sequences. See: https://www.efficios.com/pub/gnu-cauldron-2019/rseq-glibc.pdf

sys/kern/kern_membarrier.c
106

This is functionally correct, but does not have the performance characteristics that are expected from the Linux documentation. The non-_EXPEDITED versions are allowed to block and are expected to wait until all of the other cores have done at least one context switch. Callers are likely to expect this to potentially take one scheduling quantum to return but not to disrupt the other threads' execution at all.

137

If I understand correctly (some comments would be nice so I could check my understanding), this is collecting the set of CPUs that are running threads in the same process. It would be worth a comment explaining the inherent optimisation. For functional correctness, this does not need to be an accurate snapshot of the set of CPUs: it can be a racy access because any thread that is [de]scheduled out from under you during the check is implicitly meeting the requirements (anything that does a context switch is sufficiently ordered).

sys/sys/membarrier.h
34–42

It's confusing reading this that each command is a separate bit, yet the membarrier call requires that you provide a single one. Please can you add a comment that this is because MEMBARRIER_CMD_QUERY returns a bitmask of all supported commands.

I was going to ask if we can be better than Linux here and provide an enum so that these are somewhat type-safe, but I note that the Linux headers actually do define these as an enum, rather than as macros. Please can we do the same?

45

Source compatibility with what? Source code that relies on restartable sequences won't work on FreeBSD. glibc does not provide a wrapper around this, so any source code that actually uses it is using the system call via syscall.

51

This is unused and relates to a flag that is not supported. It is confusing for consuming code if flags are defined in headers for features that the OS doesn't support.

55

Note that until very recently this was a two-argument function on Linux. My Ubuntu 20.04 install with a 5.11 kernel has the two-argument version. I wouldn't be surprised if Linux changes the number of arguments in the future. If we want to avoid breakage then we may want to do an open-like thing here and make the libc wrapper membarrier(int, unsigned, ...) and forward to the correct kernel variant based on the second argument.

sys/sys/proc.h
841

Please can we have comments that explain what each of these are for?

sys/vm/vm_kern.c
893

I don't see any references to this function in the diff, what calls it?

sys/arm64/include/pmap.h
154

We could add pm_active to the pmap struct on arm64 at the cost of 2 atomic operations per-process switch and the extra used space in the struct. We may not want to MFC it if we consider pmap_t to be have stable size outside of pmap.c so would be better added later if so.

kib marked 8 inline comments as done.Oct 11 2021, 7:03 PM
kib added inline comments.
sys/kern/kern_membarrier.c
106

In theory this is unbounded sleep, but ok. I changed the _GLOBAL implementation to wait for each core to do at least one non-idle context switch.

137

Technically this is not just running threads, it is also e.g. rfork()ed siblings.

More correct explanation is the set of threads sharing the same address space. I believe this is identical to the set which is IPI-ed by linux.

sys/sys/membarrier.h
34–42

I found it very strange that MEMBARRIER_CMDs are bitmask values but packed in enum. But I do not have strong opinion.

45

Source compatible with Linux. For instance I see a code in urcu which tests for presence of _EXPEDITED_RSEQ in _QUERY response. With this constant present in the header, the code just compiles and works without RSEQ support (well, it requires replacing __syscall with direct function call, but you should see what I mean)

51

It reserves the ABI values for future extensions, there is nothing unusual.

I will probably look at rseq stuff after this bits are landed.

55

My instance of the manual page states that the syscall takes three args. I certainly can go with libc wrapper as you suggested, but lets postpone it.

I want to do some more systematic review of consumers. I looked at urcu, which requires a patch anyway due to use of syscall, and I was not able to pace verona build system.

sys/sys/proc.h
842

No, it is a stupidity.

sys/vm/vm_kern.c
893

It is actually pmap_active_cpus(), I did some rototiling for arm64 and did not rechecked afterward.

kib marked 8 inline comments as done.Oct 11 2021, 7:05 PM
kib added inline comments.
sys/arm64/include/pmap.h
154

I do not see a point. This would make each context switch to pay two atomics for the benefit of rarely used API. It would be esp. saddle because armv8 indeed does not need to maintain pm_active for TLB invalidations due to availability of broadcast invl instructions.

kib marked an inline comment as done.

First round of review feedback.

sys/arm64/include/pmap.h
154

Having looked at the other pmap implementations I noticed on powerpc pm_active is managed, but doesn't seem to be currently used (other than by this patch).

kib marked an inline comment as done.Oct 11 2021, 8:48 PM
kib added inline comments.
sys/arm64/include/pmap.h
154

Which makes powerpc another candidate for the common (naive) implementation of pmap_active_cpus().

sys/sys/membarrier.h
34–42

Please can you add the requested comment explaining *why* they are separate bits (i.e. so that MEMBARRIER_CMD_QUERY can return the set of supported commands as a bitfield)?

kib marked 2 inline comments as done.Oct 12 2021, 9:33 AM

Is there any useful reference/doc about rseq? So far the best I can find is the comment in Linux/kernel/rseq.c which is somewhat discouraging.

Add comment about bits.

sys/compat/freebsd32/capabilities.conf
431

This is fine, but I'm somewhat inclined to add CAPENABLED to syscalls.master (or additionally). It's less consistent, but better matches kern/syscalls.master and people are less likely to forget.

kib marked an inline comment as done.Oct 15 2021, 6:04 PM
kib added inline comments.
sys/compat/freebsd32/capabilities.conf
431

I added CAPENABLED to freebsd32/syscalls.master both for membarrier and rseq. I prefer to keep the entry in capabilities.conf until this file exists.

[I will upload new patch when more changes accumulate]

kib marked an inline comment as done.

mp_maxid + 1, reported by markj
add CAPENABLED to freebsd32/syscalls.master

Rework MEMBARRIER_CMD_GLOBAL loop.
Make it interruptible by a signal.