Page MenuHomeFreeBSD

riscv cpu_fork(), saving fpe state
ClosedPublic

Authored by jsihv_gmx.com on Nov 18 2025, 3:18 PM.
Tags
Referenced Files
Unknown Object (File)
Fri, May 15, 12:44 PM
Unknown Object (File)
Fri, May 15, 11:02 AM
Unknown Object (File)
Thu, May 14, 11:38 PM
Unknown Object (File)
Thu, May 14, 8:28 PM
Unknown Object (File)
Thu, May 14, 3:03 PM
Unknown Object (File)
Tue, May 12, 3:21 PM
Unknown Object (File)
Wed, May 6, 8:03 PM
Unknown Object (File)
Tue, Apr 28, 8:41 AM
Subscribers

Details

Summary

This is an attempt to replace TODO in riscv's cpu_fork(). The code updates the floating point state before it proceeds copying the pcb struct.

pcb_fp_started flag is used as a condition as in a corresponding arm64 code and eg. in riscv's exec_machdep.c:fill_fpregs(). It indicates that fp is in use. I have verified with a debug code that when PCB_FP_STARTED is off, pcb_x registers are always zero.

A corresponding function in arm64 has a condition (td1 == curthread) for saving the fp state (and amd64 has it in MPASS).
arm64 added this condition here: https://github.com/freebsd/freebsd-src/commit/2db317ca8577a0b208ecae38ecfa4a8a690a0fa8
However, so far I have not found out in which kind of situation td1 might not be curthread. I've put a such condition (td1 != curthread) to my debug code in order to see if this condition is ever true but after booting and making some forking as a user, the condition has not ever been true. I also asked about it on our IRC channel and nobody seemed to know what's the meaning of this condition. So I have not included this condition here.

critical_enter() and critical_exit() are used here similarly as arm64 and amd64 use it in corresponding places to prevent a thread being preempted when thread's state is being saved.

fpe_state_save() leads to an assembly code in swtch.S which contains instructions which are equivalent to fpe_enable() and fpe_disable() and these calls are therefore not used here.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

if td1 is always curthread, then a kassert is needed

looks like td comes from ecall_handler(), which is curthread

jsihv_gmx.com added a reviewer: riscv.

I added MPASS macro following the example of AMD64's code in a corresponding place. I verified that the execution always visits ecall_handler() before it arrives here.

This revision is now accepted and ready to land.Feb 27 2026, 9:41 AM
This revision was automatically updated to reflect the committed changes.

I am sorry to speak up only now, after the review sat open for so long. But in looking over the change it seems to be incomplete.

sys/riscv/riscv/vm_machdep.c
177

Looking at other implementations, we need to save the FPE context here as well.

This one is invoked when creating a new thread in the same process, while above is invoked when forking/creating a new process.

sys/riscv/riscv/vm_machdep.c
177

AMD64 & PowerPC seem to be like what you say but it looks like ARM64 does not save FP registers in cpu_copy_thread()

ARM64 calls vfp_new_thread() in both functions but this function just makes some modifications to a new thread _after_ it has been copied. In the case of AMD64 these both functions cpu_copy_thread() and cpu_fork() share a lot of common code in copy_thread() which calls fpuexit() which saves (after some further calls) fp-registers and thus corresponds the new code in this patch. PowerPC seems to be similar to AMD64. Both functions call cpu_update_pcb() which saves fp-registers to pcb.

So a question can be asked, do AMD64 and PowerPC do this just because their functions share a common function call together or do these differences (AMD64 & PowerPC vs. ARM64) reflect more genuine differences between these archs.

sys/riscv/riscv/vm_machdep.c
177

So, comments in ARM64's vm_machdep.c and differential (2db317c, April 2015) explain that cpu_fork() updates floating point registers to pcb because if the userland changes float values and then forks, it's possible to proceed to cpu_fork() without first visiting cpu_switch (which stores the registers). So ARM64 developers probably think this issue does not matter in the case of cpu_copy_thread(). But it would be good to get it confirmed ...and also if there's differences between archs here.

sys/riscv/riscv/vm_machdep.c
177

This function is called in the path of thr_new(2) and thr_create() system calls.

My belief is that exactly the same risk exists here as for cpu_fork(); the kernel avoids saving floating point context while handling exceptions (such as syscalls), and will defer this to the time of context switch. (There is a performance benefit from this.)

By inspection, I did not see anything to ensure it would be up to date here.

I did not verify this empirically. A couple years ago I tried (briefly) to produce the problem via cpu_fork(), without success. That is why this TODO remained so long.

It is likely that ARM implementations have this wrong. But the window for failure is small.

I will handle this further... Thanks for the patch.