Page MenuHomeFreeBSD

More fixes for stacks
ClosedPublic

Authored by kib on Jul 19 2023, 5:41 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 24, 1:33 PM
Unknown Object (File)
Sun, Jan 5, 2:09 AM
Unknown Object (File)
Dec 13 2024, 6:17 PM
Unknown Object (File)
Dec 6 2024, 9:59 PM
Unknown Object (File)
Dec 5 2024, 4:11 AM
Unknown Object (File)
Dec 3 2024, 9:18 PM
Unknown Object (File)
Nov 30 2024, 6:45 AM
Unknown Object (File)
Nov 29 2024, 7:22 AM

Details

Summary
Add vm_map_insert1(9)

The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.

Suggested by:   alc
vm_map_growstack(): handle max protection for stacks

Do not assume that protection is same as max_protection.  Store both in
offset, packed in the same way as the prot syscall parameter.
vm_map_protect(): handle stack protection stored in the stack guard

mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.

Reported by:    dchagin
vm_map: do not allow to merge stack gap entries

At least, offset handling is wrong for them.
vm_map_growstack(): restore stack gap data if gap entry was removed

and then restored.
vm_map_protect(): propagate lowest stack segment protection to the grow gap

This seems to be required for Linux emulation.
vm_map_protect(): add VM_MAP_PROTECT_GROWSDOWN flag

which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.

Reported by:    dchagin
linuxolator: implement Linux' PROT_GROWSDOWN

From the Linux man page for mprotect(2):
   PROT_GROWSDOWN
       Apply  the  protection  mode  down to the beginning of a mapping
       that grows downward (which should be a stack segment or a
       segment mapped with the MAP_GROWSDOWN flag set).

Reported by:    dchagin

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thank you for fixing this. Native test fine, however glibc rtld still fails under Linuxulator.
Glibc rtld calls mprotect of stack with len=pagesize, so after stack looks like:
00007ffffffdd000-00007fffffffc000 rw-p 00000000 00:00 0
00007fffffffc000-00007fffffffd000 rwxp 0003e000 00:00 0 [stack]

Can this be fixed as well? it's hard to fix at Linuxulator level.

Can you provide me with the self-contained native test for this case?

dougm added inline comments.
sys/vm/vm_map.c
1617

What if res is NULL? If I didn't read carefully, I'd think I could pass NULL as the last argument if I didn't care about the new entry result.

Consider dropping insert1 and just have insert take an extra argument that can be a NULL when the new entry is unwanted.

2782

if ((new_prot & ~gmax_prot) != 0 ||

is an alternative.

2870

It's a small thing, but everywhere else in the code, I see GAP_DN before GAP_UP.

4551

Why not use insert1 here to get new_entry?

sys/vm/vm_map.h
528 ↗(On Diff #125137)

I don't see any callers outside vm_map.c, so why is this externally visible?

kib marked 4 inline comments as done.Jul 28 2023, 11:32 AM
kib added inline comments.
sys/vm/vm_map.c
1617

vm_map_insert() is quite popular KPI, used a lot outside of vm/. I decided that it is too much churn to change it.

I caller does not need the entry, it can use vm_map_insert(). vm_map_insert1() does not accept NULL res, which avoids NULL checks.

2782

This change is dropped in the latest version of the patch. For new gap entry code, I prefer to use literally the same text as for regular map entries.

sys/vm/vm_map.h
528 ↗(On Diff #125137)

There is a strange code in sys/security/mac/mac_process.c mac_proc_vm_revoke_recurse().

kib marked 2 inline comments as done.
kib edited the summary of this revision. (Show Details)

Handle dougm notes. Add a hack to propagate lowest stack segment permissions to the gap, which is what Linux requires.

sys/vm/vm_map.c
2718

if (!CONTAINS_SET(max_prot, new_prot) || !CONTAINS_SET(max_prot, new_maxprot))

2723

if (CONTAINS_BITS(flags, VM_MAP_PROTECT_SET_ROOT | VM_MAP_PROTECT_SET_MAXROOT) && !CONTAINS_BITS(new_maxprot, new_prot))

2733

CONTAINS_BITS(new_prot, VM_PROT_WRITE | VM_PROT_EXECUTE)

2743

#define CONTAINS_BITS(set, bits) ((~(set) & (bits)) == 0)

2762

if ((flags & VM_MAP_PROTECT_SET_PROT) == 0)
new_prot = entry->protection;
else if (!CONTAINS_BITS(entry->max_protection, new_prot) {

vm_map_unlock(map);
return (KERN_PROTECTION_FAILURE);

}

2764

else if (!CONTAINS_BITS(entry->max_protection, new_maxprot)) {
unlock; return;
}

2772

I think you don't have to do this if VM_MAP_PROTECT_SET_PROT and VM_MAP_PROTECT_SET_MAXPROT are set in flags; if they are set, then new_prot and new_maxprot haven't changed since the last iteration, when you checked them already.

2776

if (CONTAINS_BITS(prev_entry->eflags, MAP_ENTRY_GUARD | MAP_ENTRY_STACK_GAP_DN)

2905

CONTAINS_SET(prev_entry->eflags, MAP_ENTRY_GUARD | MAP_ENTRY_STACK_GAP_DN)

kib marked 12 inline comments as done.Jul 29 2023, 6:39 PM
kib added inline comments.
sys/vm/vm_map.c
2764

I did both of chunk in the spirit of your suggestion, but differently. There is no need to recalculate new_maxprot and new_prot if the checks are local.

2772

prev_entry might be out of the range specified for this call to vm_map_protect(), this is why I have to check prev_entry for perms.

kib marked 2 inline comments as done.

CONTAINS_BITS

kib edited the summary of this revision. (Show Details)

Add VM_MAP_PROTECT_GROWSDOWN and implement LINUX_PROT_GROWSDOWN

This version should fix all known regressions with the stack/gap protection handling so far.

sys/vm/vm_map.c
2716

if (!CONTAINS_BITS(max_prot, new_prot | new_maxprot))

2752

old_prot = 0;
if ((flags & VM_MAP_PROTECT_SET_PROT) != 0)

old_prot |= new_prot;

if ((flags & VM_MAP_PROTECT_SET_MAXPROT) != 0)

old_prot |= new_maxprot;
2775

if (!CONTAINS_BITS(entry->max_protection, old_prot)) {

 vm_map_unlock(map);
return (KERN_PROTECTION_FAILURE);

}

kib marked 3 inline comments as done.

More CONTAINS_BITS() code compaction.
Stop VM_MAP_PROTECT_GROWSDOWN loop to find stack gap, when we found the first map entry.

sys/vm/vm_map.c
2775

I meant that you could compute old_prot before the for loop, since it will have the same value in every iteration.

kib marked an inline comment as done.

Move old_prot calculation for phase1 out of loop.
Kill phase1 helper since it is shorter to inline.

Unlock map on error.
Simplify and unify phase1 guard and non-guard entries handling even more.

sys/vm/vm_map.c
2702

I think you can drop "_phase3" from the name.

kib marked an inline comment as done.Jul 30 2023, 6:53 AM

Just to say, all glibc stack related tests are passed now, thank you.

sys/vm/vm_map.c
1613
1614
2722

There is an extra space.

kib marked 3 inline comments as done.

Rename vm_map_protect_gap_phase3().
Editorial changes and comments and style.

sys/vm/vm_map.c
2750

Add a comment explaining this next bit.

kib marked an inline comment as done.

Added comment for vm_map_protect() implementation of PROT_GROWSDOWN.
Reset start to the original value on 'again' iteration, since we must re-lookup gap from the user-specified point.

sys/vm/vm_map.c
2753
2757

old_prot isn't the right name for this value. Maybe check_prot?

2768
max_prot = (entry->eflags & MAP_ENTRY_GUARD) != 0 ?
    PROT_MAX_EXTRACT(entry->offset) : entry->max_protection;
if (!CONTAINS_BITS(max_prot, check_prot)) {
    ...
}

would be clearer to me.

4575

Does vm_map_mergeable_neighbors() need an update to handle this overloading?

kib marked 4 inline comments as done.Aug 4 2023, 4:34 PM
kib added inline comments.
sys/vm/vm_map.c
4575

Yes. The MAP_ENTRY_NOMERGE_MASK was updated to include gaps. See the corresponding commit in the list.

kib marked an inline comment as done.

Add check_prot and max_prot locals.

This revision is now accepted and ready to land.Aug 9 2023, 8:04 PM
sys/vm/vm_map.c
1612
1837

In fact, only a part of the object may be mapped. This is an old error.

sys/vm/vm_map.c
2320

A suggested comment rewrite, to include something about the return value:

Compare two consecutive map entries. If they can be merged, expand the range of the second to cover the range of the first and delete the first. Return the surviving map entry that covers the start of the combined range.

The map must be locked.

sys/vm/vm_map.c
2320

After verbal feedback from another reviewer, another try:

Consider two map entries that represent consecutive ranges. If the entries can be merged, expand the range of the second to cover the range of the first and delete the first. Then return the map entry that includes the first range.

kib marked 4 inline comments as done.Aug 10 2023, 7:58 AM
kib added inline comments.
sys/vm/vm_map.c
1837

The last sentence in this paragraph is also not relevant, I removed it.

2320

I changed the first word from 'consider' to 'compare'. Is it fine?

kib marked 2 inline comments as done.

Comment editing.

In the first phase of vm_map_protect(), do skip MAP_ENTRY_GUARDs that are not stack grow areas (reported by pho).

This revision now requires review to proceed.Aug 10 2023, 8:04 AM
sys/vm/vm_map.c
2320

Yes, it's fine.

This revision is now accepted and ready to land.Aug 10 2023, 4:54 PM