Page MenuHomeFreeBSD

procctl(PROC_STACKGAP_CTL)
ClosedPublic

Authored by kib on Aug 21 2019, 3:33 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 3, 6:27 PM
Unknown Object (File)
Fri, Jan 3, 6:22 PM
Unknown Object (File)
Fri, Jan 3, 6:08 PM
Unknown Object (File)
Fri, Jan 3, 5:55 PM
Unknown Object (File)
Fri, Jan 3, 5:50 PM
Unknown Object (File)
Fri, Jan 3, 5:44 PM
Unknown Object (File)
Sat, Dec 28, 6:12 PM
Unknown Object (File)
Sat, Dec 28, 2:21 AM
Subscribers

Details

Summary

Allows a process to request that stack gap was not applied to its stacks.

This is actually of limited usefulness since the gap where the stack grows is still maintained, it just allows to completely exhaust the gap on per-process basis instead of setting global security.bsd.stack_guard_pages.

[I will write the man pages updates after the code is agreed upon]

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 25993

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
In D21352#464519, @kib wrote:
In D21352#464514, @alc wrote:

If I remember correctly, this would be applicable to openjdk. Specifically, my recollection is that there were different classes of threads, such as system maintenance threads and application code execution threads, and we wound up with two guard entries for application threads, one created by the kernel and another by the JVM.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239894

Just to make sure that we're on the same page, here is what I have inferred is happening based on that PR:

First, the JVM is doing a mmap(MAP_STACK), which creates two vm map entries, a normal entry and beneath it a guard entry. Let X be the starting address of this guard entry. Then, the JVM is doing a mmap(X, "HotSpot Guard Size", MAP_ANON | MAP_FIXED), which creates the region labeled "HotSpot Guard Pages", replacing the initial part of the guard entry. Now, the "HostSpot Guard Pages" start at address X, and the starting address of the guard entry is address Y, where Y > X by the size of the "HotSpot Guard Pages". The trouble is vm_map_growstack() is still going to implement the security.bsd.stack_guard_page but at address Y rather than X. And, implementing it at Y interferes with the JVM's own guard mechanism.

Yes?

In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

I suspect that eliminating implicit kernel guard would fix JVM, but so far I was unable to receive the confirmation. Another suggestion I made was to do mmap(2) on a guard, and it seems that JVM already uses this, but again I am not sure. I probably need to spend several hours reading ktrace output from openjdk runs.

In D21352#464943, @kib wrote:
In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

I suspect that eliminating implicit kernel guard would fix JVM, but so far I was unable to receive the confirmation. Another suggestion I made was to do mmap(2) on a guard, and it seems that JVM already uses this, but again I am not sure. I probably need to spend several hours reading ktrace output from openjdk runs.

I suspect that it does too.

In D21352#464984, @alc wrote:
In D21352#464943, @kib wrote:
In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

Default guard size is zero, so it is not created explicitly. Third is it because you count two guards and mapped part ?

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

In D21352#464986, @kib wrote:
In D21352#464984, @alc wrote:
In D21352#464943, @kib wrote:
In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

Default guard size is zero, so it is not created explicitly. Third is it because you count two guards and mapped part ?

Yes, that was how I arrived at 3. However, by default, i.e., without passing an "attr" that sets a non-zero guard size, there should only be 2 map entries, a guard entry and a normal entry for the "mapped part". Yes?

Here is the code that the JVM uses for creating a thread:

https://github.com/battleblow/openjdk-jdk11u/blob/b7c3885ee7b1a13dca3293e9cb19c4f21887d9fa/src/hotspot/os/bsd/os_bsd.cpp#L772

Note that the "attr" passed to pthread_create() doesn't include a guard size, so there shouldn't be a libthr-created guard.

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

Actually, I think that it is relevant. As far as I can tell, the explicitly created libthr guard doesn't disable the security.bsd.stack_guard_page guard. If so, we are actually killing the thread before it hits the libthr guard.

In D21352#465045, @alc wrote:
In D21352#464986, @kib wrote:
In D21352#464984, @alc wrote:
In D21352#464943, @kib wrote:
In D21352#464862, @alc wrote:

Yes?

Almost, but not quite. There is pthread feature where thread can be created with some attributes. Among them are the stack size and guard size. We create the stack of the specified size as you described (two entries, guard has reserved part which cannot be grown into), and below the libthr.so puts yet another guard entry, of the size specified in the attribute. JVM hacks on that guard, not on the implicit kernel guard.

Does libthr create a third, guard entry by default, or only if you pass an explicit "attr"?

Default guard size is zero, so it is not created explicitly. Third is it because you count two guards and mapped part ?

Yes, that was how I arrived at 3. However, by default, i.e., without passing an "attr" that sets a non-zero guard size, there should only be 2 map entries, a guard entry and a normal entry for the "mapped part". Yes?

Yes.

Here is the code that the JVM uses for creating a thread:

https://github.com/battleblow/openjdk-jdk11u/blob/b7c3885ee7b1a13dca3293e9cb19c4f21887d9fa/src/hotspot/os/bsd/os_bsd.cpp#L772

Note that the "attr" passed to pthread_create() doesn't include a guard size, so there shouldn't be a libthr-created guard.

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

Actually, I think that it is relevant. As far as I can tell, the explicitly created libthr guard doesn't disable the security.bsd.stack_guard_page guard. If so, we are actually killing the thread before it hits the libthr guard.

Well, it is relevant in the sense that the guard is created. I mean that it is irrelevant whether libthr does mprotect() or mmap(MAP_GUARD) to create the extra guard at the bottom.

In D21352#465082, @kib wrote:
In D21352#465045, @alc wrote:
In D21352#464986, @kib wrote:

There is one detail, I forgot it, and think that it is mostly irrelevant, but it is still interesting. This was the reason why vm_map_protect() does clip and then skips GUARD entries. When creating a guarded stack, libthr does mmap(total size of stack and guard), then mprotect(guard). This clips out part of the guard to the permanent guard, the rest is the shrinking entry.

Actually, I think that it is relevant. As far as I can tell, the explicitly created libthr guard doesn't disable the security.bsd.stack_guard_page guard. If so, we are actually killing the thread before it hits the libthr guard.

Well, it is relevant in the sense that the guard is created. I mean that it is irrelevant whether libthr does mprotect() or mmap(MAP_GUARD) to create the extra guard at the bottom.

When a guard map entry is created the (object, offset) pair is (NULL, 0). Later, when that entry is clipped at the start, whether by mprotect() or mmap(MAP_FIXED), the offset for the remainder of the guard entry will be adjusted by the size of the clipping, even though the object is NULL. We could use the offset to recognize that an explicit guard exists, and automatically override the implicit security.bsd.stack_guard_page in vm_map_growstack(). Essentially,

max_grow = gap_entry->end - gap_entry->start;
if (guard > gap_entry->offset &&
    guard - gap_entry->offset > max_grow)
        return (KERN_NO_SPACE);

That said, I don't have an immediate answer to how this would handle the case of stacks that grow up. Perhaps, we would have to manually adjust the offset.

I would also like to suggest that we store the stack_guard_page value that was in force at the time that the guard entry was created in the map entry and use this stored value in vm_map_growstack(). The next_read field should be unused by guard entries.

In D21352#465204, @alc wrote:

I would also like to suggest that we store the stack_guard_page value that was in force at the time that the guard entry was created in the map entry and use this stored value in vm_map_growstack(). The next_read field should be unused by guard entries.

This is something I will do definitely.

For the proposal of automatic use of the offset for the guard entry to decrease the forced stack gap, I am not sure. First, it basically defeats the gap, since an appropriate mapping that is created in the place of the gap allows to clash the stack into it. The only protection left is that we now require MAP_FIXED to get something mapped into the gap area. Second, if doing that, I would prefer explicit checks and then explicit action of reduction of the stored gap size.

That said, do you have any objections or suggestions for the procctl(2) knob ?

In D21352#465323, @kib wrote:

That said, do you have any objections or suggestions for the procctl(2) knob ?

I have no objections. I think that this is a feature that we should support. I only question whether disabling the stack gap on a process should apply retroactively to the existing stacks and not just the ones created after the procctl() operation.

If stackgap is disabled, do not apply gap for already created stacks as well.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239894#c17

 Greg Lewis freebsd_committer 2019-08-30 22:58:04 UTC

I got a VM set up with a recent -CURRENT, applied the patches from the review and rebuilt world.
I can confirm that after that I can run
proccontrol -m stackgap -s disable java -cp . InfiniteRecursion
and get a StackOverflowError as expected.  Running without the proccontrol line yields a crash as it did previously.

I plan to commit this later today.

sys/kern/kern_procctl.c
526

The placement of blank lines around PROC_LOCK_ASSERT() is different between this function and stackgap_status() for no obvious reason.

kib marked an inline comment as done.

Unify use of blank lines.

My only hesitation from "rubber stamping" this change is the difficulty of describing the complete semantics of this option, specifically, the semantics that arise when reenabling the stack gap. Suppose that we have disabled and then reenabled stack gaps for a process. Stacks created before gaps are disabled will then have their original gap enforced, unless the stack has grown into the previously defined gap region. Stacks created while gaps are disabled will have no gap after gaps are reenabled. Consequently, I want to ask: Should we actually support reenabling stack gaps on a process once they've been disabled?

Split current program stackgap state control, and the state after execve. Current state can be only disabled, while state after exec can be re-enabled.

Upload the right patch:
Split current program stackgap state control, and the state after execve. Current state can be only disabled, while state after exec can be re-enabled.

Also update procctl(2) man page.

sys/vm/vm_map.c
4190–4197

if (gap_bot == gap_top)

return (KERN_SUCCESS);

seems easier than indenting the lines below.

kib marked an inline comment as done.

Unindent.

In D21352#467724, @kib wrote:

Split current program stackgap state control, and the state after execve. Current state can be only disabled, while state after exec can be re-enabled.

These semantics make sense to me.

lib/libc/sys/procctl.2
508 ↗(On Diff #61513)

"A stack gap ... area for a

510 ↗(On Diff #61513)

"filled by memory.

511 ↗(On Diff #61513)

"Instead, the process ... to receive a

513 ↗(On Diff #61513)

"... accessing pages in ...

514 ↗(On Diff #61513)

"... corrupting memory adjacent

522 ↗(On Diff #61513)

"... enabled for programs

523 ↗(On Diff #61513)

"started after an

525 ↗(On Diff #61513)

"by the ...

528 ↗(On Diff #61513)

"For existing stacks, the ...

530 ↗(On Diff #61513)

"... after they are

538 ↗(On Diff #61513)

"... to an integer ..., which is used to return a bitmask consisting

kib marked 11 inline comments as done.

Grammar.

The code looks good.

lib/libc/sys/procctl.2
508 ↗(On Diff #61522)

"growing area" -> "growth area"

510 ↗(On Diff #61522)

I would replace ", which" by "that". See, for example, which-vs-that.

511 ↗(On Diff #61522)

I would start a new sentence here.

519–521 ↗(On Diff #61522)

Shouldn't this be: "containing either the PROC_STACKGAP_ENABLE or PROC_STACKGAP_DISABLE flag."

522 ↗(On Diff #61522)

"If the PROC_STACKGAP_ENABLE flag is passed, ...

526 ↗(On Diff #61522)

"If the PROC_STACKGAP_DISABLE flag is passed, ...

527 ↗(On Diff #61522)

"..., the gap becomes part of the normal growth area

529 ↗(On Diff #61522)

I would mention execve here: "After gaps are disabled in a process, they can only be re-enabled when an execve is performed.

534 ↗(On Diff #61522)

"Returns the current stack gap state ...

535 ↗(On Diff #61522)

Drop the "The" here.

538 ↗(On Diff #61522)

spelling: "constisting"

This revision is now accepted and ready to land.Sep 1 2019, 5:22 PM
kib marked 11 inline comments as done.

Switch to interpret each flag in PROC_STACKGAP_CTL, instead of PROC_STACKGAP_ENABLE manage it all.
Update man page.

This revision now requires review to proceed.Sep 1 2019, 7:39 PM
lib/libc/sys/procctl.2
510 ↗(On Diff #61535)

The comma should be deleted.

519 ↗(On Diff #61535)

"... integer variable containing flags.

523 ↗(On Diff #61535)

"This flag is only accepted ...

526 ↗(On Diff #61535)

"... flag causes an

528 ↗(On Diff #61535)

"error to be returned.

529 ↗(On Diff #61535)

There is an extra space: "... in a ...

kib marked 6 inline comments as done.

Edits for the man page.

lib/libc/sys/procctl.2
511 ↗(On Diff #61543)

The comma got deleted from the wrong place. This line and the previous one should be:

"mapped region that is reserved and never filled by memory.
Instead, the process is guaranteed to receive a

534–535 ↗(On Diff #61543)

Instead of the word "normal", the meaning of which may not be clear, consider saying, "For existing stacks, the gap is no longer a reserved part of the growth area and can be filled by memory on access." I'm trying to reuse the same terms that are used in the sentence defining what a gap is.

540–542 ↗(On Diff #61543)

This sentence is not entirely consistent with the code. I would characterize this flags as allowing the current value of PROC_STACKGAP_{DIS,EN}ABLE to be inherited. In other words, the code won't disable gaps after an execve unless gaps were disabled in the process before the execve.

kib marked 3 inline comments as done.

More manpage clarifications.

lib/libc/sys/procctl.2
543 ↗(On Diff #61558)

"..., if the currently executing program has ...

545 ↗(On Diff #61558)

I think that the phrase "regardless of the flag" will be confusing, because it's not entirely obvious which flag it refers to. It would be okay just to delete the phrase. I think that the sentence is clear without it.

kib marked 2 inline comments as done.

More man page editing.

This revision is now accepted and ready to land.Sep 2 2019, 10:43 PM
This revision was automatically updated to reflect the committed changes.