Page MenuHomeFreeBSD

Enable interrupts while handling traps
ClosedPublic

Authored by mhorne on Aug 9 2020, 4:49 PM.

Details

Summary

I observed hangs post-r362977 in QEMU with -smp 2, in which one thread
would acquire write access to an rm_lock (sysctllock) and get stuck
waiting in smp_rendezvous_cpus while the other CPU was servicing a trap.
The other thread was waiting for read access to the same lock, thus
causing deadlock.

It's clear that this is just one symptom of a larger problem. The general
expectation of MI kernel code is that interrupts are enabled. Violating
this assumption will at best create some additional latency, but otherwise
might cause locking or other unforeseen issues. All other architectures do so
for some subset of trap values, but this somehow got missed in the RISC-V
port. Enable interrupts now during kernel page faults and all user trap
types.

The code in exception.S already knows to disable interrupts while
handling the return from exception, so there are no changes required
there.

Test Plan

I could pretty reliably hang the system while running the test suite for
sys/net. With this patch I was no longer able to do so.

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 32891
Build 30293: arc lint + arc unit

Event Timeline

mhorne requested review of this revision.Aug 9 2020, 4:49 PM
mhorne created this revision.

It seems this should also be enabled when handling certain supervisor exceptions. For which exception types would it be appropriate to do so?

arm64 only enables interrupts while servicing page faults, whereas i386 does so during most trap types. Is there a general guideline to this?

At least for x86, (almost) all traps are same as the system calls. We typically have to make hardware to disable interrupts on entry because switching from usermode to proper kernel hardware state is not atomic, and interrupt handlers need to distinguish between entry from kernel mode vs. usermode to avoid doing some stuff that must not be done. Most typical problem for amd64 is SWAPGS which should be done when coming from user, but must be not when coming from kernel.

After we entered the kernel environment, and saved any additional hardware state that might be clobbered by interrupt or context switch, like %cr2 the address of the last page fault, or fs/gsbase, we typically enable interrupts when coming from userspace. For amd64 interrupts are enabled in asm code from exception.S, i386 was unified to do it in trap.c.

Normal kernel code expects that interrupts are enabled. Consider disabling interrupts as something very similar to holding a spinlock.

arm64 enables interrupts for userspace exceptions in do_el0_sync. It also needs to enable them in data_abort when not in the kernel debugger. See rS287961 and rS353920 for details.

We should enable interrupts for kernel page faults (when not in the kernel debugger as Andrew noted). There are cases where we handle supervisor-mode page faults such as copyin()/copyout() and friends, and accesses to pageable kernel memory (execve argument and pipe buffers).

Enable interrupts while handling kernel page faults.

mhorne retitled this revision from Enable interrupts while handling user mode traps to Enable interrupts while handling traps.Aug 11 2020, 9:10 PM
mhorne edited the summary of this revision. (Show Details)
sys/riscv/riscv/trap.c
209

You might assert, or at least comment, that interrupts are already enabled in the user mode case.

Looks ok to me. Others have already explained the rationale well I think. I don't think there are any other kernel faults for which interrupts should be enabled. ddb and panic() are probably fine to run with interrupts disabled. dtrace is the only part I'm less sure of, but I suspect it is fine to run with them disabled?

This revision is now accepted and ready to land.Aug 12 2020, 8:58 PM

Add a comment to data_abort explaining that interrupts are already enabled in the user case.

This revision now requires review to proceed.Aug 13 2020, 1:07 PM
This revision is now accepted and ready to land.Aug 13 2020, 1:56 PM
This revision was automatically updated to reflect the committed changes.