Page MenuHomeFreeBSD

Improve MD page fault handlers.
Needs ReviewPublic

Authored by kib on Sun, Sep 8, 4:27 PM.

Details

Reviewers
alc
markj
jilles
manu
Group Reviewers
bhyve
Summary

Centralize calculation of signal and ucode delivered on unhandled page fault.

Change the delivered signal for accesses to mapped area after the backing object was truncated.
According to POSIX description for mmap(2),

The system shall always zero-fill any partial page at the end of an object. Further, the system
shall never write out any modified portions of the last page of an object which are beyond its
end. References within the address range starting at pa and continuing for len bytes to whole
pages following the end of an object shall result in delivery of a SIGBUS signal.

An implementation may generate SIGBUS signals when a reference would cause an error in the
mapped object, such as out-of-space condition.

Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures.

Use the opportunity and move the signal calculation to trap_pfault().

Not sure what better ucode to specify when SIGBUS is delivered for swap reservation failure on CoW.

PR: 211924

Diff Detail

Repository
rS FreeBSD src repository
Lint
Lint Skipped
Unit
Unit Tests Skipped
Build Status
Buildable 26497

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kib updated this revision to Diff 61808.Sun, Sep 8, 4:31 PM
kib edited the summary of this revision. (Show Details)

Handle swap reservation failure with SIGBUS.

kib edited the summary of this revision. (Show Details)Sun, Sep 8, 4:33 PM
kib added reviewers: alc, markj, jilles.
kib added subscribers: lwhsu, emaste.
kib updated this revision to Diff 61833.Mon, Sep 9, 12:00 PM

Move signal translation to vm_fault.c. Handle i386.

markj added inline comments.Mon, Sep 9, 11:51 PM
sys/i386/i386/trap.c
394

Perhaps use bool instead since you're changing the function anyway?

sys/vm/vm_fault.c
573

"Unexpected" would be more accurate than "Unknown".

589

Extra space after else.

601

I guess it is fine to change the ucode for old binaries on non-tier 1 arches since we do not attempt to maintain compatibility for them? It might be worth explaining in a comment that this is only really intended on i386/amd64.

kib marked 3 inline comments as done.Tue, Sep 10, 10:19 AM
kib added inline comments.
sys/vm/vm_fault.c
601

I already looked at non-x86 arches, and they have typically huge non-compliance there. E.g. arm always deliver SIGSEGV.

Enabling compat mode requires manually frobbing the sysctl from user, so I think this change is fine.

kib updated this revision to Diff 61875.Tue, Sep 10, 10:21 AM

Adjust for Mark notes.

markj added a comment.Tue, Sep 10, 2:10 PM

The change LGTM. I would consider keeping the vm_fault() wrapper for vm_fault_hold(), to avoid churn if vm_fault_hold() ever needs to grow additional arguments.

kib added a comment.Tue, Sep 10, 2:44 PM

The change LGTM. I would consider keeping the vm_fault() wrapper for vm_fault_hold(), to avoid churn if vm_fault_hold() ever needs to grow additional arguments.

There is only one caller of vm_fault() outside of the trap_pfault()s. Note that vm_map_wire() use of vm_fault() caused ktrace recording the wired pages, which IMO is not right.

I think we can restore vm_fault() if needed, but currently there is no need.

markj added a comment.Tue, Sep 10, 2:47 PM
In D21566#470372, @kib wrote:

The change LGTM. I would consider keeping the vm_fault() wrapper for vm_fault_hold(), to avoid churn if vm_fault_hold() ever needs to grow additional arguments.

There is only one caller of vm_fault() outside of the trap_pfault()s. Note that vm_map_wire() use of vm_fault() caused ktrace recording the wired pages, which IMO is not right.

There was vmm as well.

I think we can restore vm_fault() if needed, but currently there is no need.

Ok.

I like that this makes signal numbers and codes more architecture-independent and reduces the number of conversions of this information.

I think additional testcases would be helpful to make and keep this correct. I can write some this week.

sys/amd64/amd64/trap.c
756

It looks like this was originally SEGV_MAPERR and I like that better since usermode cannot map anything in kernel space.

800–805

It looks like this leaves *signo and *ucode uninitialized (if usermode). It used to send signal SIGSEGV with code SEGV_ACCERR and I think that is correct.

I do notice that Linux has a specific code SEGV_PKUERR and an additional siginfo field si_pkey for this case but that change, if we want it at all, should not be in this review.

sys/vm/vm_fault.c
583–585

BUS_OBJERR might be more correct, but it may also be helpful to have a different value to help in diagnosis.

1091–1092

I expect signal SIGBUS with code BUS_OBJERR in both of these cases.

kib marked 5 inline comments as done.Wed, Sep 11, 8:14 PM
kib added inline comments.
sys/amd64/amd64/trap.c
800–805

I added SEGV_PKUERR, since I also added BUS_OOMERR. Do you know what is written into si_pkey ?

sys/vm/vm_fault.c
583–585

I added BUS_OOMERR.

kib updated this revision to Diff 61952.Wed, Sep 11, 8:15 PM
kib edited the summary of this revision. (Show Details)
kib marked 2 inline comments as done.

Handle jilles' notes. Add changes for !x86 (not even compiled).

kib retitled this revision from Change the delivered signal for accesses to mapped area after the backing object was truncated. to Improve MD page fault handlers..Wed, Sep 11, 8:18 PM
kib edited the summary of this revision. (Show Details)
alc added inline comments.Wed, Sep 11, 10:46 PM
sys/amd64/amd64/trap.c
114

The return value is never used.

alc added inline comments.Wed, Sep 11, 11:11 PM
sys/amd64/amd64/trap.c
348–353

I think that you actually want to use the return value from trap_pfault() here rather than signo.

alc added inline comments.Wed, Sep 11, 11:26 PM
sys/amd64/amd64/trap.c
800–804

This assignment no longer serves any purpose.

kib updated this revision to Diff 61966.Thu, Sep 12, 6:28 AM
kib marked 3 inline comments as done.

Fix bugs pointed out by alc, for amd64 and i386.

alc added inline comments.Thu, Sep 12, 3:26 PM
sys/amd64/amd64/trap.c
191

Style: "pf" should be first.

195–196

As an aside, I question if any of these three assignments are needed, even by gcc 4.2.1. Arguably, these assignments are harmful in the respect that they guarantee that the compiler won't generate a warning in case a future change failed to assign a meaningful value to one of these variables and that value was needed, i.e., we were generating a signal.

407

Style: Eliminate the space after the cast.

701

Please consider adding KASSERT((signo != NULL && ucode != NULL) || !usermode, ...) as "documentation" for the new parameters.

749–750

The truncation here is pointless. In fact, the truncation is repeated in vm_fault_trap().

kib marked 5 inline comments as done.Thu, Sep 12, 3:57 PM
kib added inline comments.
sys/amd64/amd64/trap.c
195–196

Yes, I believe signo/ucode/addr initializations can be eliminated. But dr6 cannot, because of T_BPTFLT for kernel.

gcc 4.2 should be irrelevant already, I think -Werror was turned off for it recently.

kib updated this revision to Diff 61984.Thu, Sep 12, 3:58 PM
kib marked an inline comment as done.

More changes requested by alc.

alc added inline comments.Thu, Sep 12, 5:43 PM
sys/amd64/amd64/trap.c
678–689

Please add a comment here describing the meaning of the three possible return values.

sys/amd64/vmm/vmm.c
1414

I would suggest renaming vm_fault_hold() to vm_fault().

sys/arm/arm/trap-v4.c
358

Shouldn't this be error == KERN_SUCCESS

sys/arm64/arm64/trap.c
212–213

Like amd64, the truncation here is pointless.

kib updated this revision to Diff 61994.Thu, Sep 12, 6:21 PM
kib marked 4 inline comments as done.

Rename vm_fault_hold->vm_fault.
Add comment for amd64 trap_pfault.
Minor changes for arm*.

kib updated this revision to Diff 62028.Fri, Sep 13, 9:47 AM
  • Stop overloading KERN_INVALID_ADDRESS for out-of-object error. This breaks other situations, it seems that the best route is to add a new Mach-style error.
  • Switch to use switch in signal decoding in vm_fault_trap().
  • Do not assume that map == kernel_map implies signo != NULL. copyin faults on user address, but in kernel mode.

I imported the test from D21624 to ease testing.

alc added inline comments.Fri, Sep 13, 3:00 PM
sys/riscv/riscv/trap.c
219–220

I'm surprised to see this snippet brought back to life in the riscv code. We removed it from elsewhere in the source tree almost exactly 4 years ago (r287625).

I would suggest committing the removal of this snippet now, and thus separately from the rest.

sys/vm/vm_fault.c
620

1->0

alc added a comment.Fri, Sep 13, 5:48 PM
In D21566#470372, @kib wrote:

... Note that vm_map_wire() use of vm_fault() caused ktrace recording the wired pages, which IMO is not right.

I agree.

kib marked 2 inline comments as done.Fri, Sep 13, 7:24 PM
kib updated this revision to Diff 62066.Fri, Sep 13, 8:21 PM

Rebase after riscv commit.

jilles added inline comments.Sat, Sep 14, 11:10 AM
sys/amd64/amd64/trap.c
800–805

si_pkey contains the protection key (between 0 and 15) that caused the fault.

kib added inline comments.Sat, Sep 14, 1:45 PM
sys/amd64/amd64/trap.c
800–805

I cannot provide this information without race, where userspace might update pk for the faulting address whle handler did not yet fetched the pte.

I think FreeBSD would not provide si_pkey.

kib updated this revision to Diff 62097.Sat, Sep 14, 4:49 PM

Build fixes: tb passes.

kib added a comment.Sat, Sep 21, 3:30 PM

Any further comments on the change ?

alc added inline comments.Sat, Sep 21, 11:00 PM
sys/amd64/amd64/trap.c
341

"Emulator can handle this trap?"

347

This assignment could be moved down below the two "if" statements, just before the "break".

682

"If this fault was fatal, typically from kernel mode"

684

"If this fault was handled by updating either the user or kernel page table, execution ..."

686

"If this fault was from user mode and it was not handled, a synchronous signal

sys/i386/i386/trap.c
733–735

The amd64 comment could be used here with the addition of -2 to the list of return values.

sys/vm/vm_fault.c
1092

Can you double-check this case? Does jilles@ comment still apply here? I ask because KERN_INVALID_ADDRESS is going to generate a SIGSEGV, not a SIGBUS.

sys/vm/vm_param.h
116

I would suggest "KERN_OUT_OF_BOUNDS". I have never seen this phrase used except with the plural "bounds".