Page MenuHomeFreeBSD

arm64: Fix kernel panic in get_arm64_sve during core dump
Needs ReviewPublic

Authored by wac_gmail.com on Mon, Jan 5, 6:17 PM.
Tags
None
Referenced Files
F141880558: D54532.diff
Sun, Jan 11, 10:26 PM
Unknown Object (File)
Sat, Jan 10, 5:21 AM
Unknown Object (File)
Sat, Jan 10, 1:50 AM
Unknown Object (File)
Fri, Jan 9, 7:58 PM
Unknown Object (File)
Fri, Jan 9, 3:19 PM
Unknown Object (File)
Fri, Jan 9, 12:58 PM
Unknown Object (File)
Fri, Jan 9, 3:48 AM
Unknown Object (File)
Thu, Jan 8, 3:09 PM
Subscribers

Details

Reviewers
andrew
manu
Summary

The coredump logic calls get_arm64_sve twice: once to get the note size, and once to get the data. The note size calculation depended on the volatile PCB_FP_SVEVALID flag. If this flag was cleared between the two calls (e.g., due to a context switch clearing the flag to comply with the ABI), the second call would expect a smaller buffer size than the first, triggering a KASSERT panic ("invalid size").

Fix this by:

  1. Using pcb->pcb_svesaved != NULL (a stable indicator of SVE usage) to determine the note size.
  2. Calling vfp_to_sve_sync() if SVE is not currently valid but SVE state is saved, ensuring the SVE buffer in the core dump contains the most up-to-date VFP/NEON register values.

PR: 292195

Test Plan

Triggered core dump repro from bug 292177 and observed a good core file and no kernel panic.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Would it be enough to move the td == curthread .. check & call to vfp_save_state before checking for PCB_FP_SVEVALID in line 942?

I don't think moving vfp_save_state(td) is sufficient because of lazy SVE restoration. If the thread hasn't executed SVE instructions since a context switch, the trap to re-enable the unit won't have fired. In that state, vfp_save_state(td) sees the unit as disabled and won't set PCB_FP_SVEVALID and we'll still potentially see this panic. Checking pcb_svesaved != NULL avoids this by checking the backing store directly.

After the first call to vfp_save_state from get_arm64_sve with buf == NULL the PCB_FP_SVEVALID flag will be set correctly based on if pcb_svesaved contains valid data or not (it may be clear if called while in a syscall). If we then context switch then vfp_save_state_switch will exit early as VFP is disabled so PCB_FP_SVEVALID isn't changed & later calls to vfp_save_state will also not change the state.

It's ok to drop the SVE state if the thread is in a system call (other than sigreturn) as the ABI doesn't guarantee it will remain valid.

I tested a patch to move vfp_save_state & haven't hit the KASSERT, even when loading the system down to try cause more context switching.