Details

Reviewers

shurd
olce
andrew
manu

Group Reviewers

iflib

Commits

rG245157fd8a38: ktrcsw(): should not be called when the thread is owning interlock or on sleepq
rG783b8a0fd880: kern/sched: deduplicate dtrace hook vars
rG9409e8698030: kern/sched: deduplicate sdt probes
rG1322760fd127: sys: enable both SCHED_ULE and SCHED_4BSD for some configs
rGa84a39dfe5d1: kern/sched: move duplicate preemption stat vars into sched_shim.c
rGff870b783f09: sched_shim: restore kern.ccpu sysctl
rG5a6e0e31bc2e: sysctl kern.sched.ule.topology_spec: allow to run if ULE is not initialized
rGb602ba1b5fd9: net/iflib.c: move out scheduler-depended code into the hook
rG1c4e16f6db81: x86/cpu_machdep.c: unconditionally fence
rGc384b35e42ee: x86/local_apic.c: remove direct SCHED_ULE use
rG377c053a43f3: cpu_switch(): unconditionally wait on the blocked mutex transient
rGba8f429f42ec: kern/sched_shim.c: Add sysctl kern.sched.available
rG8aa8289d991b: sys: Move 4BSD sysctls under kern.sched.4bsd
rGb125c4d13095: sys: Make sched_4bsd a sched instance
rGeb454937a3c0: sys: Move ULE sysctls under kern.sched.ule
rGd14e018024bb: sys: Make sched_ule a sched instance
rG7efbfd6ff649: kern/sched_shim.c: provide required SYSINIT hooks
rGce38acee8d0b: Add kern/sched_shim.c
rGbab24f22ba45: kern/sched_shim.c: Provide a scheduler selection machinery
rG0b474a48dc58: sys/sched.h: add SCHED_STAT_DECLARE()
rGa556ec46d313: kern/sched_{ule,4bsd}.c: cleanup headers
rG610d7062c60b: sched_4bsd: remove unused function sched_pctcpu_delta()
rG8515934ce3c2: sys/sched.h: make sched_clear_tdname() function prototypes unconditional
rG03d61fe97857: arm, riscv: add a preprocessor symbol indicating missed support of ifunc
rG57bb132e98b0: maybe_preempt(): make static in sched_4bsd.c
rG23266bc9928f: amd64/machdep.c: remove extra empty line

Summary

Make both our schedulers configurable into the same kernel image. Add tunable kern.sched.name to select the scheduler on boot (ULE or 4BSD).

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

minsoochoo0122_proton.me added inline comments.Jan 22 2026, 10:11 PM

sys/kern/sched_ule.c
3351 ↗	(On Diff #170251)	I mean the parameter. Is it considered as used if the function pointer is stored in the slots table?

kib marked 2 inline comments as done.Jan 22 2026, 10:43 PM

kib added inline comments.

sys/kern/sched_ule.c
3351 ↗	(On Diff #170251)	Well, there is no compiler' complain, so why add this?

minsoochoo0122_proton.me added inline comments.Jan 22 2026, 10:53 PM

sys/kern/sched_ule.c
3351 ↗	(On Diff #170251)	Interesting... I see `__unused` everywhere in sys so I thought compiler was strict on this. But if not this should be fine...

In D54831#1253408, @kib wrote:

In D54831#1253398, @olce wrote:

I'd like also that we investigate why the loop on the blocked lock was elided for 4BSD.

Simply because threads are never blocked because there is only one global scheduler lock on 4BSD instead of per-runq lock on ULE. So on ULE you are not guaranteed that the sched_switch() finished with removing the thread from CPU while cpu_switch() already tries to switch to the new context. The blocked state makes this race closed without the need to somehow lock several runqs. On 4BSD it is impossible (global lock).

I well know what it is used for in ULE. Was not sure for 4BSD as it uses the block lock as well. After a check, that is just to release sleepqueues et alter's locks earlier (as ULE also does), and indeed the new selected thread cannot have the block lock.

Checking the blocked state on 4BSD would give 4 instructions overhead which I dismiss with prejudice.

And rightly so, as it's not going to make any noticeable performance difference.

olce added a reviewer: olce.Jan 23 2026, 10:48 AM

olce added inline comments.Jan 23 2026, 10:53 AM

sys/amd64/amd64/cpu_switch.S
495 ↗	(On Diff #170251)	Since the idea of scheduler as kernel module takes time for more investigation, I'm fine with this approach. Once merged, I'll ask 4BSD users on the mailing list to benchmark 4BSD vs ULE. Please coordinate at least with me before sending anything. We have to be careful in the choice of words, since the really important thing is that users do these tests after my ULE fixes. If people want to test 4BSD vs. ULE in the meantime, that's of course fine, but we don't want to press them to do that.

In D54831#1253496, @olce wrote:

In D54831#1253408, @kib wrote:

In D54831#1253398, @olce wrote:

I'd like also that we investigate why the loop on the blocked lock was elided for 4BSD.

Simply because threads are never blocked because there is only one global scheduler lock on 4BSD instead of per-runq lock on ULE. So on ULE you are not guaranteed that the sched_switch() finished with removing the thread from CPU while cpu_switch() already tries to switch to the new context. The blocked state makes this race closed without the need to somehow lock several runqs. On 4BSD it is impossible (global lock).

I well know what it is used for in ULE. Was not sure for 4BSD as it uses the block lock as well. After a check, that is just to release sleepqueues et alter's locks earlier (as ULE also does), and indeed the new selected thread cannot have the block lock.

I do not see what the code fragment in cpu_switch.S has to do with sleepqueue. It can only observe either runq lock, or blocked lock, if ever.
But anyway, the outcome is same.

Checking the blocked state on 4BSD would give 4 instructions overhead which I dismiss with prejudice.

And rightly so, as it's not going to make any noticeable performance difference.

So we agreed, I believe.

Make kern.sched sysctls working when 4BSD is selected:
sysctl kern.sched.ule.topology_spec: allow to run if ULE is not initialized
sched_shim: restore kern.ccpu sysctl

It is apparently should be considered part of the ABI, and is used by
the base top(1).  But do not declare the ccpu variable in headers, it is
needed only by 4bsd. So put the variable definition into sched_shim.c to
make the kernel buildable without SCHED_4BSD.

In D54831#1253557, @kib wrote:

I do not see what the code fragment in cpu_switch.S has to do with sleepqueue. It can only observe either runq lock, or blocked lock, if ever.

I was talking about the use of the block lock in sched_4bsd.c.

So we agreed, I believe.

Sure. I just wanted to understand why the code elision for 4BSD, in case it had been a mistake, in addition to making sure executing it now does not change anything functional.

minsoochoo0122_proton.me added inline comments.Jan 23 2026, 3:55 PM

sys/amd64/amd64/cpu_switch.S
495 ↗	(On Diff #170251)	Got that.

Handle all arches for cpu_switch.S.

Globally enable both schedulers for LINT.
Enable both schedulers for GENERIC on amd64.

I believe this variant of the branch is more or less complete, modulo the feedback.

Herald added a reviewer: andrew. · View Herald TranscriptJan 23 2026, 4:17 PM

Herald added a reviewer: andrew. · View Herald Transcript

Herald added a reviewer: manu. · View Herald Transcript

Herald added subscribers: riscv, jhibbits. · View Herald Transcript

kib marked 2 inline comments as done.Jan 23 2026, 4:18 PM

minsoochoo0122_proton.me added inline comments.Jan 23 2026, 4:27 PM

sys/conf/NOTES
214 ↗	(On Diff #170271)	I have no idea what this character does. My pending review removes this as a cleanup but it will take some time to land. Could you check if this is a typo?

kib marked an inline comment as done.Jan 23 2026, 4:37 PM

kib added inline comments.

sys/conf/NOTES
214 ↗	(On Diff #170271)	This is ASCII 'form feed' character. Sometimes referenced as 'next page'. It skips page for printers, and some editors might use it to split the view. Why removing something that you do not understand?

minsoochoo0122_proton.me added inline comments.Jan 23 2026, 4:42 PM

sys/conf/NOTES
214 ↗	(On Diff #170271)	Sorry. There was no documentation or comment on that and different editors (phab, vim, github) displayed it differently, so I was confused.

In D54831#1253559, @kib wrote:

Make kern.sched sysctls working when 4BSD is selected:

I do not see any new change related to that (compared to the previous diff)?

sysctl kern.sched.ule.topology_spec: allow to run if ULE is not initialized

Mmm... That kind of problem would be automatically avoided if not exporting settings for both 4BSD and ULE, which also changes the sysctl hierarchy.

Could we just have the same kern.sched as before with all applicable settings of the current scheduler (and not showing those that are not applicable), without the intermediate ule or 4bsd? E.g., by converting all the static SYSCTL_*() macros into SYSCTL_ADD_*() ones in the particular sched_setup() functions.

Once done, the kern.sched.instance_name can just be removed (there is already kern.sched.name). kern.sched.available can of course be kept.

sched_shim: restore kern.ccpu sysctl

(It is used by ps(1) as well.)

Other points:

I find having the sched_ule_ and sched_4bsd_ prefixes of function names a bit ugly. Could we have a macro similar to SLOT() for these, so that we would write, e.g.:

...
static void
SCHED(fork)(struct thread *td, struct thread *childtd)
...

instead of:

...
static void
sched_4bsd_fork(struct thread *td, struct thread *childtd)
...

Please see inline comments.
sswitch is unfortunate, but I agree bypasses are not really worth the trouble. :-)

sys/kern/sched_shim.c
255 ↗	(On Diff #170271)	There's the problem here that 4BSD is not the default when it is the only one compiled-in. Could you change the logic so that, if there's only one scheduler, it is automatically selected? Additionally, if an explicit scheduler has been passed and is not available, then what about printing a warning and continuing running with some arbitrary one?
253 ↗	(On Diff #170267)	I'd use `kern.sched.name` here, to match the `kern.sched.name` knob (see herald comment).

In D54831#1253590, @olce wrote:

In D54831#1253559, @kib wrote:

Make kern.sched sysctls working when 4BSD is selected:

I do not see any new change related to that (compared to the previous diff)?

The KASSERT() in sysctl_kern_sched_ule_topology_spec() was changed to if(), so it avoids a panic or UB when ULE is not initialized.

sysctl kern.sched.ule.topology_spec: allow to run if ULE is not initialized

Mmm... That kind of problem would be automatically avoided if not exporting settings for both 4BSD and ULE, which also changes the sysctl hierarchy.

I want users that tweak schedulers, to put any schedulers' tweaks into their /etc/sysctl.conf, without causing errors if some of the scheduler is not enabled or not configured.
Such users must be intelligent enough to understand that settings for non-active schedulers are not active.
Also I want to be very clear which setting belong to which scheduler.
So I prefer the new scheme.

Could we just have the same kern.sched as before with all applicable settings of the current scheduler (and not showing those that are not applicable), without the intermediate ule or 4bsd? E.g., by converting all the static SYSCTL_*() macros into SYSCTL_ADD_*() ones in the particular sched_setup() functions.

See above, I want this to be explicit and always available.

Once done, the kern.sched.instance_name can just be removed (there is already kern.sched.name). kern.sched.available can of course be kept.

I renamed the tunable to kern.sched.name.
I am not sure what do you mean by kern.sched.instance_name, I cannot find such thing.

sched_shim: restore kern.ccpu sysctl

(It is used by ps(1) as well.)

Ok, so restored already.

Other points:

I find having the sched_ule_ and sched_4bsd_ prefixes of function names a bit ugly. Could we have a macro similar to SLOT() for these, so that we would write, e.g.:
...
static void
SCHED(fork)(struct thread *td, struct thread *childtd)
...
instead of:
...
static void
sched_4bsd_fork(struct thread *td, struct thread *childtd)
...

I do not like it, It adds a level of obfuscation and I do not see any advantages. I been there with the i386 pmap variants, but for pmap it at least was needed. It is still ugly.

Please see inline comments.

sswitch is unfortunate, but I agree bypasses are not really worth the trouble. :-)

Add sched_instance_select() call to all arches.
Rename the tunable to kern.sched.name.
Automatically fall back to some scheduler if the named one is not found, and there is one.

Apparently, both armv7 and riscv lack ifunc for kernel.

In D54831#1253606, @kib wrote:

Mmm... That kind of problem would be automatically avoided if not exporting settings for both 4BSD and ULE, which also changes the sysctl hierarchy.

I want users that tweak schedulers, to put any schedulers' tweaks into their /etc/sysctl.conf, without causing errors if some of the scheduler is not enabled or not configured.
Such users must be intelligent enough to understand that settings for non-active schedulers are not active.
Also I want to be very clear which setting belong to which scheduler.
So I prefer the new scheme.

I see. I don't like it very much as it is slightly confusing for regular users and breaking the existing interface, but OK.

I am not sure what do you mean by kern.sched.instance_name, I cannot find such thing.

Indeed, I meant kern.sched.sched_instance_name. Could you rename it to simply kern.sched.name?

While here, I'd also rename the sched_instance_name variable to just sched_name.

I do not like it, It adds a level of obfuscation and I do not see any advantages. I been there with the i386 pmap variants, but for pmap it at least was needed. It is still ugly.

Still ugly, yes, but less ugly IMHO. Anyway.

In D54831#1253627, @olce wrote:

In D54831#1253606, @kib wrote:

I am not sure what do you mean by kern.sched.instance_name, I cannot find such thing.

Indeed, I meant kern.sched.sched_instance_name. Could you rename it to simply kern.sched.name?

I still do not understand. There is no such sysctl, and I do not remember introducing it. On the patched kernel,

root@:/ # sysctl kern.sched | grep instance
root@:/ #

While here, I'd also rename the sched_instance_name variable to just sched_name.

ok.

Rename sched_instance_name variable to sched_name.

Looks good.

You're better placed than me to know the policy about arm and riscv and if this can be committed even before ifuncs are developed for them.

In D54831#1253638, @kib wrote:

I still do not understand. There is no such sysctl, and I do not remember introducing it. On the patched kernel,

My bad, I misread twice, the sysctl is already named kern.sched.name.

armv7 and riscv64 are tier 2, breaking them is not permitted. I have an idea for how to make them work though, without being too invasive.

I think this warrants a "Relnotes: yes" tag.

In D54831#1253740, @jrtc27 wrote:

armv7 and riscv64 are tier 2, breaking them is not permitted. I have an idea for how to make them work though, without being too invasive.

ifunc can be emulated with the function pointer. This is of course breaks the microoptimizations real ifunc provides, but should keep the arches going until maintainers finally implement the proper solution.
I do not want to code that.

In D54831#1253758, @kib wrote:

In D54831#1253740, @jrtc27 wrote:

armv7 and riscv64 are tier 2, breaking them is not permitted. I have an idea for how to make them work though, without being too invasive.

ifunc can be emulated with the function pointer. This is of course breaks the microoptimizations real ifunc provides, but should keep the arches going until maintainers finally implement the proper solution.
I do not want to code that.

I was going to make the shims actual shims rather than ifuncs, yes. I didn't implement kernel ifuncs for riscv when I did userspace because there was no use for them at the time, unlike userspace where they had been requested. But given armv7 needs support too and AFAIR is the only arch to not even have support in rtld you'd still need such a workaround. I plan to write the shim version later tonight.

Add function-pointers based workaround for risvc and arm.
Hopefully at least riscv would grow ifuncs in some time frame.

For now such arches are marked with __DO_NOT_HAVE_SYS_IFUNC.

https://reviews.freebsd.org/P685 is a proposal for how to drastically reduce the amount of boilerplate here (even compared with the ifunc-only version). The wrapper vs ifunc difference is mostly hidden, with the only way it leaks out being that the macro needs to take argument types and names separately so it can use just the names in the call for the wrapper variant. That could be simplified back down in a future ifunc-only version. Built and test-booted arm64 GENERIC and riscv64 QEMU (with other patches to deal with the other issues here).

sys/kern/sched_4bsd.c
1643 ↗	(On Diff #170297)	This has been unused since c72188d85a793c7610208beafb83af544de6e3b7, no need to keep it and add a stub for ULE
1801 ↗	(On Diff #170297)	This and the corresponding ULE change don't build on non-KTR kernels, which is most of them (powerpc's DPAA is the only non-LINT one), since ts_name isn't defined in the corresponding struct for !KTR. Either the slot and shim need to be conditional too or it should be made a stub in each for !KTR (and the callers can drop the #ifdef). Did you test build this at all?
sys/kern/sched_shim.c
373 ↗	(On Diff #170297)
379 ↗	(On Diff #170297)
526–537 ↗	(On Diff #170297)	This currently matches if one is a prefix of the other, since the loop terminates as soon as you hit the terminator of one string and treats it as a match. This is one way of writing a fixed version (I believe), based on lib/libc/string/strcmp.c's structure.

Take the DEFINE_SHIM() proposal.
Fix inlined strcmp().
Also hopefully fix compilation issues, but I only started tinderbox.

jrtc27 added inline comments.Jan 24 2026, 2:04 AM

sys/kern/sched_4bsd.c
1801 ↗	(On Diff #170297)	This one's still missing a KTR #ifdef, no?

sys/kern/sched_4bsd.c
1801 ↗	(On Diff #170297)	I built and run-time tested by debug config on amd64. I started tinderbox after posting the patch, and now restarted it for the latest update. The patch is still in the state of discussion, so I did not see it reasonable to put that resources on it. Thank you for taking a try and finding the issues.