Page MenuHomeFreeBSD

Stop early clipping in vm_map_protect
ClosedPublic

Authored by dougm on Jun 20 2019, 4:19 PM.

Details

Summary

vm_map_protect may return an error response after clipping the first or last map entry in the region to be reserved. This creates a pair of matching entries that should have been "simplified" back into one, or never created. This change defers the clipping of those entries until most of the vm_map_protect failure cases have been ruled out.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dougm created this revision.Jun 20 2019, 4:19 PM
dougm added inline comments.Jun 21 2019, 3:12 AM
vm_map.c
2535 ↗(On Diff #58840)

Should we be accumulating the total swap reserved in this loop and releasing it on an error return?

2538 ↗(On Diff #58840)

Should we be delaying the assignment of current->cred until the next loop, since otherwise we may cause two entries to become mergeable, and return early with KERN_RESOURCE_SHORTAGE without either restoring them to their original states or merging them?

markj added inline comments.Jun 21 2019, 6:13 PM
vm_map.c
2535 ↗(On Diff #58840)

Ideally, I think so. Note though that you would also need to go back and release any acquired cred refs. If you do as you suggest below and defer assignment of current->cred to the second loop, you wouldn't have to worry about this.

2555 ↗(On Diff #58840)

We may have reserved swap space for an earlier entry without clipping it. If the entry spanned one of start or end we will have marked the entry as charged by setting entry->cred, but the amount of reserved swap space will be less than entry->end - entry->start. So when the entry is deleted, we may release more swap space than was reserved. At least, I don't see anything preventing this scenario.

dougm retitled this revision from Stop early clipping in vm_map_reserve to Stop early clipping in vm_map_protect.Jun 22 2019, 3:55 AM
dougm edited the summary of this revision. (Show Details)
dougm updated this revision to Diff 58889.Jun 22 2019, 4:42 AM

Reduce the swap_reserve calls to one, and not clip or modify map entry fields until it succeeds. This change makes an assumption about the need to check ptoa(obj->size) - that it isn't necessary when you the start and end of the map entry - that I'm believing because alc told me so.

dougm updated this revision to Diff 58890.Jun 22 2019, 5:40 AM

Relative to the previous version, combine the last two loops of vm_map_protect into one.

dougm updated this revision to Diff 58891.Jun 22 2019, 5:44 AM

Don't allocate swap space until the in-transition test is passed.

pho added a comment.Jun 22 2019, 7:44 AM

With D20711.58891.diff I got:

20190622 09:11:08 all (10/70): mmap18.sh
panic: no cred for entry 0xfffff8060ee67310
cpuid = 1
time = 1561187513
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00adcfe750
vpanic() at vpanic+0x19d/frame 0xfffffe00adcfe7a0
panic() at panic+0x43/frame 0xfffffe00adcfe800
vm_fault_copy_entry() at vm_fault_copy_entry+0x65a/frame 0xfffffe00adcfe8b0
vm_map_protect() at vm_map_protect+0x496/frame 0xfffffe00adcfe960
kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00adcfe990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adcfeab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00adcfeab0
--- syscall (74, FreeBSD ELF64, sys_mprotect), rip = 0x80042f00a, rsp = 0x7fffdfbfbf78, rbp = 0x7fffdfbfbfb0 ---
KDB: enter: panic
[ thread pid 60596 tid 100406 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> run pho
db:0:pho> set $lines 20000
db:0:pho>  run pho1
db:1:pho1> bt
Tracing pid 60596 tid 100406 td 0xfffff801f87d8000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe00adcfe750
vpanic() at vpanic+0x1ba/frame 0xfffffe00adcfe7a0
panic() at panic+0x43/frame 0xfffffe00adcfe800
vm_fault_copy_entry() at vm_fault_copy_entry+0x65a/frame 0xfffffe00adcfe8b0
vm_map_protect() at vm_map_protect+0x496/frame 0xfffffe00adcfe960
kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00adcfe990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00adcfeab0

https://people.freebsd.org/~pho/stress/log/dougm035.txt

dougm updated this revision to Diff 58892.Jun 22 2019, 7:56 AM

Reorder events in the final loop now that the significance of 'cred' has been partially revealed to me.

pho added a comment.Jun 22 2019, 4:06 PM

I ran all of the mmap() tests I have on D20711.58892.diff for a total of 8 hours.
I also did a buildworld / installworld.
No problems seen with this partial test.

dougm updated this revision to Diff 58905.Jun 23 2019, 8:32 AM

Don't account for clipping in computing the amount of swap space to reserve. Fix some style glitches.

kib added inline comments.Jun 23 2019, 6:53 PM
vm_map.c
2555 ↗(On Diff #58840)

This is still not done, right ?

2477 ↗(On Diff #58905)

{} are not needed.

2504 ↗(On Diff #58905)

(...eflags & MAP_ENTRY_NEEDS_COPY) != 0

You can check that in_tran == NULL there, I am not sure.

2549 ↗(On Diff #58905)

!= 0

dougm updated this revision to Diff 58919.Jun 23 2019, 7:36 PM
dougm marked 3 inline comments as done.

Address easy suggestions from reviewer.

dougm added inline comments.Jun 23 2019, 7:51 PM
vm_map.c
2555 ↗(On Diff #58840)

I'm not clear about whether this comment expresses a concern about a bug that this patch would introduce, or a bug that's already there that this patch is being asked to fix.

So far, I'm not trying to fix anything except the cases where vm_map_protect clips or modifies entries and then fails without undoing those changes.

Is this about using entry->end - entry->size instead of ptoa(obj->size)? I'm happy to go back to ptoa(obj->size). Somebody told me that they were the same.

If that's not it, then I'm not sure what we're talking about.

kib added inline comments.Jun 23 2019, 8:06 PM
vm_map.c
2555 ↗(On Diff #58840)

Look at how the clip_start entry is handled, for instance for NEED_COPY case. You reserve the amount of swap needed for non-clipped entry, then set cred for clipped. This means that the swap reserved is bigger than the size of the entry, and the difference would be never released.

markj added inline comments.Jun 23 2019, 8:09 PM
vm_map.c
2555 ↗(On Diff #58840)

I was pointing out a bug in the initial version of the patch, which is still present and related to my comment above. Basically, with your patch we are reserving swap space based on the size of unclipped map entries, and I believe that that is incorrect. I will try to provide more context in my comments in the future.

2508 ↗(On Diff #58919)

Suppose an entry spans end. That entry will be clipped below, but we will reserve space for the new entry created by that clipping without charging that entry for the space.

dougm added inline comments.Jun 23 2019, 8:31 PM
vm_map.c
2555 ↗(On Diff #58840)

That's right now. But back up to diff 58892, where the critical line was:

to_reserve += ulmin(end, current->end) -
                 ulmax(start, current->start);

so I was reserving just the amount needed for clipped entries. But Mark's first comment was made with this line in place, so I didn't understand it to be about accounting-for-clipping.

Why did I change this line 12 hours ago? Because a Peter Holm crash suggested I wasn't reserving enough swap space, and I thought this line might be the problem.

2555 ↗(On Diff #58840)

Thanks for all the help you can give me. But, as explained in the response to Kostik, I was accounting for clipping at the time you made your comment. I'm just not doing it now.

markj added inline comments.Jun 23 2019, 8:38 PM
vm_map.c
2555 ↗(On Diff #58840)

The initial version had a different variant of the problem: it accounted for the clipping when reserving swap space, but the first loop could return an error before any clipping actually happened. So the amount of swap space reserved may not have matched the entry size.

dougm added inline comments.Jun 23 2019, 8:38 PM
vm_map.c
2555 ↗(On Diff #58840)

Isn't this what happens now? That is, clip_start calls vm_entry_charge_object, which leaves the two halves of the clipped entry both with the same underlying object and the same charge. Isn't that range being charged twice *now*? I don't see where clip_start is later making two objects, one for each of the two map entries, with reduced charges.

dougm updated this revision to Diff 58927.Jun 23 2019, 8:56 PM

Don't assume I can ignore ptoa(obj->size). Account for clipping when there's no object.

dougm updated this revision to Diff 58932.Jun 23 2019, 11:14 PM

If vm_map_entry_back is going to be invoked by clipping, then use the size of the untrimmed entry, which is what vm_map_entry_back will set as the size of the object that it creates. Otherwise, if there's no object, there won't be an object made and use the clipped map entry difference. Otherwise, use the size from the pre-existing object, the size of which won't be modified by vm_map_entry_charge_object, or by clip_start or clip_end.

dougm updated this revision to Diff 58933.Jun 23 2019, 11:17 PM

Remove a redundant condition, one which is tested for at the top of the loop.

dougm updated this revision to Diff 58934.Jun 24 2019, 2:35 AM

Add an assertion that the amount of swap space to reserved has been calculated correctly.

pho added a comment.Jun 24 2019, 4:04 AM

With D20711.58934.diff I get:

20190624 05:49:07 all (2/15): mmap11.sh
panic: vm_map_protect: wrong amount reserved
cpuid = 1
time = 1561348157
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aca04800
vpanic() at vpanic+0x19d/frame 0xfffffe00aca04850
panic() at panic+0x43/frame 0xfffffe00aca048b0
vm_map_protect() at vm_map_protect+0x7b9/frame 0xfffffe00aca04960
kern_mprotect() at kern_mprotect+0xc0/frame 0xfffffe00aca04990
amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00aca04ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aca04ab0

https://people.freebsd.org/~pho/stress/log/dougm037.txt

dougm updated this revision to Diff 58939.Jun 24 2019, 4:33 AM

Move the simplification of entries to a separate, third loop, not because I want to but because I want to see if simplification is what's making my accounting assertion fail, or if the cause is some other thing I don't understand.

So everyone but Peter is free to ignore this version.

dougm updated this revision to Diff 58941.Jun 24 2019, 5:29 AM

Retreat. I know I can improve something, and I know that I'm misunderstanding something else. I'll try to take the small win for now.

dougm edited the summary of this revision. (Show Details)Jun 24 2019, 7:46 AM
alc added inline comments.Jun 24 2019, 7:03 PM
vm_map.c
2515 ↗(On Diff #58941)

Isn't this test redundant? In other words, doesn't vm_map_clip_start() perform the same test?

dougm updated this revision to Diff 58962.Jun 24 2019, 7:40 PM

Remove redundant test.

alc accepted this revision.Jun 25 2019, 2:54 AM
This revision is now accepted and ready to land.Jun 25 2019, 2:54 AM
markj accepted this revision.Jun 25 2019, 7:28 AM