Remove a couple of uma_prealloc() calls.
AcceptedPublic
Actions

Authored by markj on Aug 12 2020, 6:34 PM.

Details

Reviewers

alc
kib
jeff

Summary

uma_prealloc() simply allocates slabs ahead of time. However, unless
uma_reserve() or uma_reserve_kva() is also called, or UMA_ZONE_NOFREE is
set, the uma_prealloc() call doesn't really accomplish anything since any
free slabs will be reclaimed well before slab allocations start to fail.

In the vm_map case, we specify UMA_ZONE_NOFREE, but the preallocation
does nothing to ensure that allocations will be successful. The
preallocation size is also quite small on modern systems. In the
siginfo case I think the preallocation accomplishes nothing. I propose
removing them.

Diff Detail

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 32951
Build 30346: arc lint + arc unit

Event Timeline

markj requested review of this revision.Aug 12 2020, 6:34 PM

markj created this revision.

Harbormaster completed remote builds in B32951: Diff 75746.Aug 12 2020, 6:34 PM

markj edited the summary of this revision. (Show Details)Aug 12 2020, 6:34 PM

markj added reviewers: alc, kib, jeff.

I'll go a step further and question the point of even having a zone for the maps. Once upon a time, there were user-space "share" maps, but once those were eliminated and "regular" maps were embedded in the vmspace, supporting dynamic allocation of maps stopped being necessary.

This revision is now accepted and ready to land.Aug 12 2020, 6:46 PM

In D26047#577923, @alc wrote:

I'll go a step further and question the point of even having a zone for the maps. Once upon a time, there were user-space "share" maps, but once those were eliminated and "regular" maps were embedded in the vmspace, supporting dynamic allocation of maps stopped being necessary.

Indeed, and if we remove support for dynamic allocation of kernel submaps I start to wonder if the whole submap mechanism can't itself be replaced by something simpler like a pair of vmem arenas for pipe and execve KVA ranges.

In D26047#577926, @markj wrote:

In D26047#577923, @alc wrote:

I'll go a step further and question the point of even having a zone for the maps. Once upon a time, there were user-space "share" maps, but once those were eliminated and "regular" maps were embedded in the vmspace, supporting dynamic allocation of maps stopped being necessary.

Indeed, and if we remove support for dynamic allocation of kernel submaps I start to wonder if the whole submap mechanism can't itself be replaced by something simpler like a pair of vmem arenas for pipe and execve KVA ranges.

Then vm_fault() would need to handle detour into vmem arenas.

kib added inline comments.Aug 12 2020, 7:26 PM

sys/kern/kern_sig.c
254	Hm, I think there is a reason for this, possibly badly executed. We should provide some robustness for realtime signal delivery. I do not remember exact wording from POSIX, but it does require some guaranteed min number of realtime signals queued in the system. That said, ksiginfo_alloc() wait argument can be removed, the function never uses M_WAITOK.

In D26047#577927, @kib wrote:

In D26047#577926, @markj wrote:

In D26047#577923, @alc wrote:

I'll go a step further and question the point of even having a zone for the maps. Once upon a time, there were user-space "share" maps, but once those were eliminated and "regular" maps were embedded in the vmspace, supporting dynamic allocation of maps stopped being necessary.

Indeed, and if we remove support for dynamic allocation of kernel submaps I start to wonder if the whole submap mechanism can't itself be replaced by something simpler like a pair of vmem arenas for pipe and execve KVA ranges.

Then vm_fault() would need to handle detour into vmem arenas.

Why? The mapped ranges can correspond to kernel_map entries with MAP_ENTRY_NOFAULT clear. exec_map is subdivded this way at boot time. In pipe_map entries are allocated dynamically, and today I guess each entry gets a separate VM object.

markj added inline comments.Aug 12 2020, 7:45 PM

sys/kern/kern_sig.c
254	Hmm, from my reading of the sigqueue() errors, EAGAIN may be returned if a "system-wide resource limit" has been reached. It's not clear to me that we are required to make some additional effort here. ksiginfo_alloc is called with M_WAITOK in exit1() and proc_linkup().

In D26047#577952, @markj wrote:

In D26047#577927, @kib wrote:

In D26047#577926, @markj wrote:

In D26047#577923, @alc wrote:

I'll go a step further and question the point of even having a zone for the maps. Once upon a time, there were user-space "share" maps, but once those were eliminated and "regular" maps were embedded in the vmspace, supporting dynamic allocation of maps stopped being necessary.

Indeed, and if we remove support for dynamic allocation of kernel submaps I start to wonder if the whole submap mechanism can't itself be replaced by something simpler like a pair of vmem arenas for pipe and execve KVA ranges.

Then vm_fault() would need to handle detour into vmem arenas.

Why? The mapped ranges can correspond to kernel_map entries with MAP_ENTRY_NOFAULT clear. exec_map is subdivded this way at boot time. In pipe_map entries are allocated dynamically, and today I guess each entry gets a separate VM object.

Unless things have changed when I wasn't looking, maps that we can fault on, including the exec and pipe maps, use an sx lock for synchronization. In contrast, "system" maps, such as the kernel map, use a mutex. Once upon a time, using a mutex (as opposed to an sx lock) was necessary for (some) in-kernel memory allocations. It's not clear to me that things have changed in this regard.

Anyway, given that there are so few (sub)maps now (and I don't foresee that changing), I'd argue that we should allocate storage for them statically as variables.

alc added inline comments.Aug 13 2020, 12:41 AM

sys/kern/kern_sig.c
254	For what it's worth, this code predates the introduction of uma_reserve().

kib added inline comments.Aug 13 2020, 12:16 PM

sys/kern/kern_sig.c
254	My take on it is that system should ensure that apps can queue at least kern.sigqueue.preallocate rt signals, globally.

markj added inline comments.Aug 14 2020, 3:27 PM

sys/kern/kern_sig.c
254	Indeed, looks like that was the intent, but it never worked. Even if UMA somehow provided the guarantee, the siginfo_t allocator is used for both RT and non-RT signals. UMA has no good mechanism to guarantee successful M_NOWAIT allocations. One part of the problem is that it has no interface to free an item directly to the slab layer, where items can be reserved. One thing we could do is extend the semantics of uma_zone_reserve() so that uma_zfree() avoids the caching layer if the number of free items in the keg is below the reservation. Then the siginfo allocator can do: ksi = uma_zalloc(ksiginfo_zone, M_NOWAIT); if (ksi == NULL && is_rt_signal) ksi = uma_zalloc(ksiginfo_zone, M_NOWAIT \| M_USE_RESERVE); /* Succeeds unless all reserved items are used for pending RT signals. */ We might be able to add a zone flag similar to UMA_ZFLAG_LIMIT that short-circuits the cache layer in uma_zfree_arg(), so when the number of reserved items is below the threshold the reservation is replenished immediately. My main concern is to not make the fast path more expensive. I am not sure if it is worth the effort, though, given that a) the mechanism that exists today doesn't work, and b) POSIX doesn't seem to require it. If you think it is necessary I will try to implement it.

kib added inline comments.Aug 14 2020, 3:46 PM

sys/kern/kern_sig.c
254	For FreeBSD, all signals are queued, so all signals are RT. I believe this is allowed by POSIX. Also I believe that POSIX requires the guarantee of the total number of the queued signals: {SIGQUEUE_MAX} Maximum number of queued signals that a process may send and have pending at the receiver(s) at any time. Since they specifically discuss the decision that signals are not accounted to sender, it effectively means that SIGQUEUE_MAX signals must be guaranteed.

markj added inline comments.Aug 14 2020, 4:18 PM

sys/kern/kern_sig.c

254

I guess you are referring to:

The sigqueue() function can fail if the system has insufficient resources to queue the signal. An explicit limit on the number of queued signals that a process could send was introduced. While the limit is "per-sender", this volume of POSIX.1-2017 does not specify that the resources be part of the state of the sender. This would require either that the sender be maintained after exit until all signals that it had sent to other processes were handled or that all such signals that had not yet been acted upon be removed from the queue(s) of the receivers. This volume of POSIX.1-2017 does not preclude this behavior, but an implementation that allocated queuing resources from a system-wide pool (with per-sender limits) and that leaves queued signals pending after the sender exits is also permitted.

I don't see how this translates to a guarantee that some minimum number of global pending signals must be supported, especially in light of the first sentence. "with per-sender limits" implies that signals are still accounted to each sender, and in fact we are currently applying the limit in the receiver, in p_pendingcnt.

kib added inline comments.Aug 14 2020, 4:58 PM

sys/kern/kern_sig.c
254	What you cites is the discussion of the fact that I called 'signals are not accounted to senders'. But If you look at the definition of SIGQUEUE_MAX, it is rather clear that in absence of other senders, process should be able to queue SIGQUEUE_MAX signals.

markj added inline comments.Aug 15 2020, 4:43 PM

sys/kern/kern_sig.c
254	Sorry, I am missing something. I claim that the cited paragraph means that signals _are_ accounted to senders. It just says that pending signals do not need to be reclaimed when a sender exits. SIGQUEUE_MAX defines a maximum, i.e., an upper limit on the number of pending signals that a sender may have. I cannot see how this can be interpreted as a guarantee about the minimum number of pending signals.

alc added inline comments.Aug 15 2020, 8:23 PM

sys/kern/kern_sig.c
254	In short, to the best of my knowledge Kostik's interpretation is correct (or the intended interpretation, if you prefer. :-)). I believe that POSIX originally specified that a process must be able to send SIGQUEUE_MAX queued signals. Here is a relevant sentence from some version of a Linux signal(7) man page: According to POSIX, an implementation should permit at least _POSIX_SIGQUEUE_MAX (32) real-time signals to be queued to a process. So, I would agree with Mark that this is not a normal use of the word "MAX". Arguably, "MIN" would make more sense. Again, quoting the same Linux man page: However, Linux does things differently. In kernels up to and including 2.6.7, Linux imposes a system-wide limit on the number of queued real-time signals for all processes. So, ultimately, the standards changed to allow the (early) Linux implementation to be a compliant one. And, I believe from context that the intended meaning of "limit" here is that the system must provide for at least the specified number of queued signals.