Details

Reviewers

markj
alc
dougm

Group Reviewers

manpages

Commits

rS351774: Add stackgap control mode to proccontrol(1).
rS351773: Add procctl(PROC_STACKGAP_CTL)

Summary

Allows a process to request that stack gap was not applied to its stacks.

This is actually of limited usefulness since the gap where the stack grows is still maintained, it just allows to completely exhaust the gap on per-process basis instead of setting global security.bsd.stack_guard_pages.

[I will write the man pages updates after the code is agreed upon]

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 26214

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In D21352#464519, @kib wrote:

In D21352#464514, @alc wrote:

If I remember correctly, this would be applicable to openjdk. Specifically, my recollection is that there were different classes of threads, such as system maintenance threads and application code execution threads, and we wound up with two guard entries for application threads, one created by the kernel and another by the JVM.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239894

Just to make sure that we're on the same page, here is what I have inferred is happening based on that PR:

First, the JVM is doing a mmap(MAP_STACK), which creates two vm map entries, a normal entry and beneath it a guard entry. Let X be the starting address of this guard entry. Then, the JVM is doing a mmap(X, "HotSpot Guard Size", MAP_ANON | MAP_FIXED), which creates the region labeled "HotSpot Guard Pages", replacing the initial part of the guard entry. Now, the "HostSpot Guard Pages" start at address X, and the starting address of the guard entry is address Y, where Y > X by the size of the "HotSpot Guard Pages". The trouble is vm_map_growstack() is still going to implement the security.bsd.stack_guard_page but at address Y rather than X. And, implementing it at Y interferes with the JVM's own guard mechanism.

Yes?

In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

I suspect that eliminating implicit kernel guard would fix JVM, but so far I was unable to receive the confirmation. Another suggestion I made was to do mmap(2) on a guard, and it seems that JVM already uses this, but again I am not sure. I probably need to spend several hours reading ktrace output from openjdk runs.

In D21352#464943, @kib wrote:

In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

I suspect that eliminating implicit kernel guard would fix JVM, but so far I was unable to receive the confirmation. Another suggestion I made was to do mmap(2) on a guard, and it seems that JVM already uses this, but again I am not sure. I probably need to spend several hours reading ktrace output from openjdk runs.

I suspect that it does too.

In D21352#464984, @alc wrote:

In D21352#464943, @kib wrote:

In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

Default guard size is zero, so it is not created explicitly. Third is it because you count two guards and mapped part ?

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

In D21352#464986, @kib wrote:

In D21352#464984, @alc wrote:

In D21352#464943, @kib wrote:

In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

Default guard size is zero, so it is not created explicitly. Third is it because you count two guards and mapped part ?

Yes, that was how I arrived at 3. However, by default, i.e., without passing an "attr" that sets a non-zero guard size, there should only be 2 map entries, a guard entry and a normal entry for the "mapped part". Yes?

Here is the code that the JVM uses for creating a thread:

https://github.com/battleblow/openjdk-jdk11u/blob/b7c3885ee7b1a13dca3293e9cb19c4f21887d9fa/src/hotspot/os/bsd/os_bsd.cpp#L772

Note that the "attr" passed to pthread_create() doesn't include a guard size, so there shouldn't be a libthr-created guard.

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

Actually, I think that it is relevant. As far as I can tell, the explicitly created libthr guard doesn't disable the security.bsd.stack_guard_page guard. If so, we are actually killing the thread before it hits the libthr guard.

In D21352#465045, @alc wrote:

In D21352#464986, @kib wrote:

In D21352#464984, @alc wrote:

In D21352#464943, @kib wrote:

In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

Default guard size is zero, so it is not created explicitly. Third is it because you count two guards and mapped part ?

Yes, that was how I arrived at 3. However, by default, i.e., without passing an "attr" that sets a non-zero guard size, there should only be 2 map entries, a guard entry and a normal entry for the "mapped part". Yes?

Yes.

Here is the code that the JVM uses for creating a thread:

https://github.com/battleblow/openjdk-jdk11u/blob/b7c3885ee7b1a13dca3293e9cb19c4f21887d9fa/src/hotspot/os/bsd/os_bsd.cpp#L772

Note that the "attr" passed to pthread_create() doesn't include a guard size, so there shouldn't be a libthr-created guard.

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

Actually, I think that it is relevant. As far as I can tell, the explicitly created libthr guard doesn't disable the security.bsd.stack_guard_page guard. If so, we are actually killing the thread before it hits the libthr guard.

Well, it is relevant in the sense that the guard is created. I mean that it is irrelevant whether libthr does mprotect() or mmap(MAP_GUARD) to create the extra guard at the bottom.

In D21352#465082, @kib wrote:

In D21352#465045, @alc wrote:

In D21352#464986, @kib wrote:

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

Actually, I think that it is relevant. As far as I can tell, the explicitly created libthr guard doesn't disable the security.bsd.stack_guard_page guard. If so, we are actually killing the thread before it hits the libthr guard.

Well, it is relevant in the sense that the guard is created. I mean that it is irrelevant whether libthr does mprotect() or mmap(MAP_GUARD) to create the extra guard at the bottom.

When a guard map entry is created the (object, offset) pair is (NULL, 0). Later, when that entry is clipped at the start, whether by mprotect() or mmap(MAP_FIXED), the offset for the remainder of the guard entry will be adjusted by the size of the clipping, even though the object is NULL. We could use the offset to recognize that an explicit guard exists, and automatically override the implicit security.bsd.stack_guard_page in vm_map_growstack(). Essentially,

max_grow = gap_entry->end - gap_entry->start;
if (guard > gap_entry->offset &&
    guard - gap_entry->offset > max_grow)
        return (KERN_NO_SPACE);

That said, I don't have an immediate answer to how this would handle the case of stacks that grow up. Perhaps, we would have to manually adjust the offset.

I would also like to suggest that we store the stack_guard_page value that was in force at the time that the guard entry was created in the map entry and use this stored value in vm_map_growstack(). The next_read field should be unused by guard entries.

In D21352#465204, @alc wrote:

I would also like to suggest that we store the stack_guard_page value that was in force at the time that the guard entry was created in the map entry and use this stored value in vm_map_growstack(). The next_read field should be unused by guard entries.

This is something I will do definitely.

For the proposal of automatic use of the offset for the guard entry to decrease the forced stack gap, I am not sure. First, it basically defeats the gap, since an appropriate mapping that is created in the place of the gap allows to clash the stack into it. The only protection left is that we now require MAP_FIXED to get something mapped into the gap area. Second, if doing that, I would prefer explicit checks and then explicit action of reduction of the stored gap size.

That said, do you have any objections or suggestions for the procctl(2) knob ?

rebase after r351453

Harbormaster completed remote builds in B26074: Diff 61223.Aug 24 2019, 2:38 PM

In D21352#465323, @kib wrote:

That said, do you have any objections or suggestions for the procctl(2) knob ?

I have no objections. I think that this is a feature that we should support. I only question whether disabling the stack gap on a process should apply retroactively to the existing stacks and not just the ones created after the procctl() operation.

If stackgap is disabled, do not apply gap for already created stacks as well.

Harbormaster completed remote builds in B26103: Diff 61281.Aug 26 2019, 7:05 AM

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239894#c17

 Greg Lewis freebsd_committer 2019-08-30 22:58:04 UTC

I got a VM set up with a recent -CURRENT, applied the patches from the review and rebuilt world.
I can confirm that after that I can run
proccontrol -m stackgap -s disable java -cp . InfiniteRecursion
and get a StackOverflowError as expected.  Running without the proccontrol line yields a crash as it did previously.

I plan to commit this later today.

alc added inline comments.Aug 31 2019, 4:05 PM

sys/kern/kern_procctl.c
526	The placement of blank lines around PROC_LOCK_ASSERT() is different between this function and stackgap_status() for no obvious reason.

Unify use of blank lines.

Harbormaster completed remote builds in B26210: Diff 61507.Aug 31 2019, 4:25 PM

My only hesitation from "rubber stamping" this change is the difficulty of describing the complete semantics of this option, specifically, the semantics that arise when reenabling the stack gap. Suppose that we have disabled and then reenabled stack gaps for a process. Stacks created before gaps are disabled will then have their original gap enforced, unless the stack has grown into the previously defined gap region. Stacks created while gaps are disabled will have no gap after gaps are reenabled. Consequently, I want to ask: Should we actually support reenabling stack gaps on a process once they've been disabled?

Split current program stackgap state control, and the state after execve. Current state can be only disabled, while state after exec can be re-enabled.

Harbormaster completed remote builds in B26211: Diff 61510.Aug 31 2019, 7:03 PM

Upload the right patch:
Split current program stackgap state control, and the state after execve. Current state can be only disabled, while state after exec can be re-enabled.

Also update procctl(2) man page.

Herald added a reviewer: manpages. · View Herald TranscriptAug 31 2019, 7:05 PM

Harbormaster completed remote builds in B26212: Diff 61511.Aug 31 2019, 7:05 PM

dougm added inline comments.Aug 31 2019, 7:11 PM

sys/vm/vm_map.c
4204	if (gap_bot == gap_top) return (KERN_SUCCESS); seems easier than indenting the lines below.

Unindent.

Harbormaster completed remote builds in B26214: Diff 61513.Aug 31 2019, 7:45 PM

In D21352#467724, @kib wrote:

Split current program stackgap state control, and the state after execve. Current state can be only disabled, while state after exec can be re-enabled.

These semantics make sense to me.

alc added inline comments.Sep 1 2019, 12:35 AM

lib/libc/sys/procctl.2
508	"A stack gap ... area for a
510	"filled by memory.
511	"Instead, the process ... to receive a
513	"... accessing pages in ...
514	"... corrupting memory adjacent
522	"... enabled for programs
523	"started after an
525	"by the ...
528	"For existing stacks, the ...
530	"... after they are
538	"... to an integer ..., which is used to return a bitmask consisting

Grammar.

Harbormaster completed remote builds in B26217: Diff 61522.Sep 1 2019, 8:01 AM

The code looks good.

lib/libc/sys/procctl.2
508	"growing area" -> "growth area"
510	I would replace ", which" by "that". See, for example, which-vs-that.
511	I would start a new sentence here.
519–521	Shouldn't this be: "containing either the PROC_STACKGAP_ENABLE or PROC_STACKGAP_DISABLE flag."
522	"If the PROC_STACKGAP_ENABLE flag is passed, ...
526	"If the PROC_STACKGAP_DISABLE flag is passed, ...
527	"..., the gap becomes part of the normal growth area
529	I would mention execve here: "After gaps are disabled in a process, they can only be re-enabled when an execve is performed.
534	"Returns the current stack gap state ...
535	Drop the "The" here.
538	spelling: "constisting"

This revision is now accepted and ready to land.Sep 1 2019, 5:22 PM

Switch to interpret each flag in PROC_STACKGAP_CTL, instead of PROC_STACKGAP_ENABLE manage it all.
Update man page.

This revision now requires review to proceed.Sep 1 2019, 7:39 PM

Harbormaster completed remote builds in B26222: Diff 61535.Sep 1 2019, 7:39 PM

alc added inline comments.Sep 1 2019, 9:44 PM

lib/libc/sys/procctl.2
510	The comma should be deleted.
519	"... integer variable containing flags.
523	"This flag is only accepted ...
526	"... flag causes an
528	"error to be returned.
529	There is an extra space: "... in a ...

Edits for the man page.

Harbormaster completed remote builds in B26225: Diff 61543.Sep 2 2019, 6:19 AM

alc added inline comments.Sep 2 2019, 4:27 PM

lib/libc/sys/procctl.2
511	The comma got deleted from the wrong place. This line and the previous one should be: "mapped region that is reserved and never filled by memory. Instead, the process is guaranteed to receive a
534–535	Instead of the word "normal", the meaning of which may not be clear, consider saying, "For existing stacks, the gap is no longer a reserved part of the growth area and can be filled by memory on access." I'm trying to reuse the same terms that are used in the sentence defining what a gap is.
540–542	This sentence is not entirely consistent with the code. I would characterize this flags as allowing the current value of PROC_STACKGAP_{DIS,EN}ABLE to be inherited. In other words, the code won't disable gaps after an execve unless gaps were disabled in the process before the execve.

More manpage clarifications.

Harbormaster completed remote builds in B26230: Diff 61558.Sep 2 2019, 5:03 PM

alc added inline comments.Sep 2 2019, 9:19 PM

lib/libc/sys/procctl.2
543	"..., if the currently executing program has ...
545	I think that the phrase "regardless of the flag" will be confusing, because it's not entirely obvious which flag it refers to. It would be okay just to delete the phrase. I think that the sentence is clear without it.

More man page editing.

Harbormaster completed remote builds in B26239: Diff 61570.Sep 2 2019, 9:38 PM

alc accepted this revision.Sep 2 2019, 10:43 PM

This revision is now accepted and ready to land.Sep 2 2019, 10:43 PM

Closed by commit rS351773: Add procctl(PROC_STACKGAP_CTL) (authored by kib). · Explain WhySep 3 2019, 6:56 PM

This revision was automatically updated to reflect the committed changes.

kib added a commit: rS351773: Add procctl(PROC_STACKGAP_CTL).

kib added a commit: rS351774: Add stackgap control mode to proccontrol(1)..Sep 3 2019, 6:58 PM

procctl(PROC_STACKGAP_CTL)
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 61513

lib/libc/sys/procctl.2

sys/compat/freebsd32/freebsd32_misc.c

sys/kern/kern_exec.c

sys/kern/kern_fork.c

sys/kern/kern_procctl.c

sys/sys/proc.h

sys/sys/procctl.h

sys/vm/vm_map.c

usr.bin/proccontrol/proccontrol.c

procctl(PROC_STACKGAP_CTL)ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 61513

lib/libc/sys/procctl.2

sys/compat/freebsd32/freebsd32_misc.c

sys/kern/kern_exec.c

sys/kern/kern_fork.c

sys/kern/kern_procctl.c

sys/sys/proc.h

sys/sys/procctl.h

sys/vm/vm_map.c

usr.bin/proccontrol/proccontrol.c

procctl(PROC_STACKGAP_CTL)
ClosedPublic
Actions

Revision Contents
Changeset List