As popstackmark may be called on this without pushstackmark having
been called, we need to initialize it so that we don't get a bogus
comparison inside popstackmark, which would have resulted in a
NULL pointer dereference.
Details
- Reviewers
- jilles - imp 
- Commits
- rGfa6fa98ac712: sh(1): initialize smark to zero in main()
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
- Lint Not Applicable 
- Unit
- Tests Not Applicable 
Event Timeline
To me, it looks like that situation can't actually happen. If there is an error or SIGINT during early initialization, state will still be 0 and the if (setjmp(main_handler.loc)) block will immediately call exitshell(exitstatus) without touching smark. If I'm wrong, please provide a reproduction scenario.
We might still want this change to make static analyzers happier.
the issue was caught by the author of chimera linux when compiling it with lto and clang16
<@q66> bapt: anyway, to reproduce: i added this to memalloc.c https://gist.github.com/q66/978ca271db9ea32c8641209eb90f2f85
<@q66> then all you have to do is exec the sh binary, and press ctrl-c
<@q66> in plain prompt
<@q66> https://gist.github.com/q66/99295f340eeaffae740d53e0cc603ad0
<@q66> actually, printf("POPSTACKMARK %p %p\n", mark->stackp, stackp); is more explanatory
<@q66> you will see a POPSTACKMARK 0 0 upon ctrl-c
<@q66> e.g. like https://gist.github.com/q66/61b70ecabb510bbdf86a11e330f89893
<@q66> if i compile without the zeroing
<@q66> i will get a bogus address
<@q66> in mark->stackp
@jilles I looked into this more and it seems that this is actually a miscompilation, possibly caused by the LLVM optimizer getting confused because of the long jumps.
If I LTO-compile the shell (with -O2) with LLVM 16 as is, I get a crash when i spawn it and ctrl-c. If I add printfs in setstackmark/popstackmark that don't touch any of the addresses, I can see that setstackmark is being called for smark as the first thing (so it should in fact be initialized) but then I still get a crash because in popstackmark, reading the mark->stackp gets me a bogus address (like 0x30103). However, if I add a printf of the passed mark's address in the set/popstackmark, this stops the crashes.
Initializing smark in main() to zero will apparently prevent that assumption in the optimizer, which results in a working binary. What likewise prevents the crash is calling setstackmark for smark in main() before the if (setjmp(...)), as I guess that also breaks the assumption, as LLVM can no longer assume that popstackmark may be called on a dirty struct.
I can reproduce this on multiple architectures, so it does not appear to be codegen-related.
and indeed, further testing reveals that the issue does not get fixed by this change (nor by moving setstackmark), only hidden; any changes to smark contents are *not* visible after the jump, which in practice means we are most likely leaking memory, or worse
This patch actually works around the problem:
From f1ddc1f587e89bab41263ad9b71207da8e16b6ff Mon Sep 17 00:00:00 2001 From: q66 <q66@chimera-linux.org> Date: Sun, 4 Jun 2023 00:21:55 +0200 Subject: [PATCH] sh(1): make smark/smark2 static This prevents the compiler from not considering changes done to the members from within the setjmp conditional. --- src.freebsd/sh/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src.freebsd/sh/main.c b/src.freebsd/sh/main.c index 4d1a0e5..955b2bb 100644 --- a/src.freebsd/sh/main.c +++ b/src.freebsd/sh/main.c @@ -97,10 +97,11 @@ static char *find_dot_file(char *); * is used to figure out how far we had gotten. */ +static struct stackmark smark, smark2; + int main(int argc, char *argv[]) { - struct stackmark smark = {0}, smark2; volatile int state; char *shinit; -- 2.41.0
Now that I am thinking about it, this may not be a compiler issue - the C standard specifies that the value of non-volatile local variables modified between the setjmp call and the longjmp is unspecified. So making these static is actually the right way to go.