Page MenuHomeFreeBSD

Load the new sp_el0 with interrupts disabled in fork_trampoline
ClosedPublic

Authored by andrew on Feb 14 2017, 4:55 PM.

Details

Summary

If an interrupt arrives in fork_trampoline after sp_el0 was written
we may then switch to a new thread, enter userland so change this
stack pointer, then return to this code with the wrong value. This
fixes this case by moving the load of sp_el0 until after interrupts
have been disabled.

Reported by: Mark Millard

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

andrew updated this revision to Diff 25164.Feb 14 2017, 4:55 PM
andrew retitled this revision from to Load the new sp_el0 with interrupts disabled in fork_trampoline.
andrew updated this object.
andrew edited the test plan for this revision. (Show Details)
andrew added a reviewer: arm64.

One minor point:

/* Skip x30 as it was restored above as lr */

now should say something like:

/* Skip x30 as it is restored below as lr */

Test status update at ~3 hours 40 minutes in:

~3.6 hours into a self-hosted buildworld buildkernel: so far
so good.

I first updated the pine64's kernel via a cross build using
your source via applying your patch. Then I rebooted with
the updated kernel.

The pine64 still has the sh with a forkshell that tests for
stack pointer changes across fork: it calls abort if it
detects such a change.

On the pine64 I then started a buildworld buildkernel that
is still running after ~3.6 hours (yea!) and there have been no
core files generated so far.

I'll report after it completes or fails. (There is some possibility
of a sudden thermal shutdown despite a heatsink and a fan:
4 "openssl speed" commands in parallel ends up with such
despite powerd running.)

FYI:

The -r312982 build in use has a non-debug kernel.

The pine 64's root file system is on a usb ssd on a
powered hub.

I've not done anything to control jemalloc behavior.
(Others have reported that disabling tcache (?) use
made things more stable for the sh issue.)

The buildworld buildkernel is running on the console,
not over an ethernet ssd session. I mention this
because I and others have seen all ethernet
communication hang-up/die on occasion if it is in
sustained, fairly frequent use.

Drat: buildworld got a different failure in sh (see below).

I'll see what I can figure out about it. I might revert
to my old port building test since this might be a
distinct issue.

Further material about the below will likely be via
the lists unless I find it ties back to this somehow.

For reference:

pid 15549 (sh), uid 0: exited on signal 6 (core dumped)
pid 715 (sh), uid 0: exited on signal 6 (core dumped)

(lldb) bt

  • thread #1: tid = 100139, 0x0000000040554e54 libc.so.7`_thr_kill + 8, name = 'sh', stop reason = signal SIGABRT
    • frame #0: 0x0000000040554e54 libc.so.7`_thr_kill + 8

      frame #1: 0x0000000040554e18 libc.so.7`__raise(s=6) + 64 at raise.c:52

      frame #2: 0x0000000040554d8c libc.so.7`abort + 84 at abort.c:65

      frame #3: 0x00000000405340fc libc.so.7`huge_node_get [inlined] __je_rtree_get(dependent=true) + 308 at rtree.h:328

      frame #4: 0x00000000405340dc libc.so.7`huge_node_get [inlined] __je_chunk_lookup(dependent=true) at chunk.h:89

      frame #5: 0x00000000405340dc libc.so.7`huge_node_get(ptr=<unavailable>) + 276 at jemalloc_huge.c:11

      frame #6: 0x0000000040534114 libc.so.7`__je_huge_salloc(tsdn=<unavailable>, ptr=<unavailable>) + 24 at jemalloc_huge.c:434

      frame #7: 0x000000004054c84c libc.so.7`ifree [inlined] __je_arena_salloc(demote=false) + 32 at arena.h:1426

      frame #8: 0x000000004054c82c libc.so.7`ifree [inlined] __je_isalloc(demote=false) at jemalloc_internal.h:1045

      frame #9: 0x000000004054c82c libc.so.7`ifree(tsd=<unavailable>, ptr=<unavailable>, tcache=<unavailable>, slow_path=<unavailable>) + 304 at jemalloc_jemalloc.c:1889

      frame #10: 0x000000004054cd94 libc.so.7`__free(ptr=0x0000000040a17720) + 148 at jemalloc_jemalloc.c:2016

      frame #11: 0x0000000000411328 sh`ckfree(p=<unavailable>) + 32 at memalloc.c:88

      frame #12: 0x0000000000407cd8 sh`clearcmdentry + 76 at exec.c:505

      frame #13: 0x0000000000406bfc sh`evalcommand(cmd=<unavailable>, flags=<unavailable>, backcmd=<unavailable>) + 3476 at eval.c:1182

      frame #14: 0x0000000000405570 sh`evaltree(n=0x0000000040a1cde8, flags=<unavailable>) + 212 at eval.c:290

      frame #15: 0x000000000041105c sh`cmdloop(top=<unavailable>) + 252 at main.c:231

      frame #16: 0x0000000000410ed0 sh`main(argc=<unavailable>, argv=<unavailable>) + 660 at main.c:178

      frame #17: 0x0000000000402f30 sh`__start + 360

      frame #18: 0x0000000040434658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41

(lldb) bt

  • thread #1: tid = 100137, 0x0000000040554e54 libc.so.7`_thr_kill + 8, name = 'sh', stop reason = signal SIGABRT
    • frame #0: 0x0000000040554e54 libc.so.7`_thr_kill + 8

      frame #1: 0x0000000040554e18 libc.so.7`__raise(s=6) + 64 at raise.c:52

      frame #2: 0x0000000040554d8c libc.so.7`abort + 84 at abort.c:65

      frame #3: 0x00000000405340fc libc.so.7`huge_node_get [inlined] __je_rtree_get(dependent=true) + 308 at rtree.h:328

      frame #4: 0x00000000405340dc libc.so.7`huge_node_get [inlined] __je_chunk_lookup(dependent=true) at chunk.h:89

      frame #5: 0x00000000405340dc libc.so.7`huge_node_get(ptr=<unavailable>) + 276 at jemalloc_huge.c:11

      frame #6: 0x0000000040534114 libc.so.7`__je_huge_salloc(tsdn=<unavailable>, ptr=<unavailable>) + 24 at jemalloc_huge.c:434

      frame #7: 0x000000004054c84c libc.so.7`ifree [inlined] __je_arena_salloc(demote=false) + 32 at arena.h:1426

      frame #8: 0x000000004054c82c libc.so.7`ifree [inlined] __je_isalloc(demote=false) at jemalloc_internal.h:1045

      frame #9: 0x000000004054c82c libc.so.7`ifree(tsd=<unavailable>, ptr=<unavailable>, tcache=<unavailable>, slow_path=<unavailable>) + 304 at jemalloc_jemalloc.c:1889

      frame #10: 0x000000004054cd94 libc.so.7`__free(ptr=0x0000000040a17520) + 148 at jemalloc_jemalloc.c:2016

      frame #11: 0x0000000000411328 sh`ckfree(p=<unavailable>) + 32 at memalloc.c:88

      frame #12: 0x0000000000407cd8 sh`clearcmdentry + 76 at exec.c:505

      frame #13: 0x0000000000406bfc sh`evalcommand(cmd=<unavailable>, flags=<unavailable>, backcmd=<unavailable>) + 3476 at eval.c:1182

      frame #14: 0x0000000000405570 sh`evaltree(n=0x0000000040a1c270, flags=<unavailable>) + 212 at eval.c:290

      frame #15: 0x000000000041105c sh`cmdloop(top=<unavailable>) + 252 at main.c:231

      frame #16: 0x0000000000410ed0 sh`main(argc=<unavailable>, argv=<unavailable>) + 660 at main.c:178

      frame #17: 0x0000000000402f30 sh`__start + 360

      frame #18: 0x0000000040434658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41

Both of these got to rtree.h line:

328 default: not_reached();

They are near each other in time:

ls -ltT /var/crash/
total 78884
-rw------- 1 root wheel 4702208 Feb 14 14:22:02 2017 sh.715.core
-rw------- 1 root wheel 4702208 Feb 14 14:22:01 2017 sh.15549.core
. . .

The buildworld was using -j 4 . I do not use any in-memory file
systems. ufs file system. The build had completed liblldb somewhat
before.

The core files did not suggest a stack corruption
to me, nor was fork active. My test code
recorded its before and after fork stack address
examples and they were equal as they should be.

It appeared that simply starting the buildworld
buildkernel would continue on so I restarted it.
It in fact continued on.

I see no reason to take the stoppage as something
to count against the change.

But once buildworld completed another failure
happend of the same type as the prior two:

(lldb) up 3
frame #3: 0x00000000405340fc libc.so.7`huge_node_get [inlined] __je_rtree_get(dependent=true) + 308 at rtree.h:328

325 		RTREE_GET_LEAF(RTREE_HEIGHT_MAX-1)
326 	#undef RTREE_GET_SUBTREE
327 	#undef RTREE_GET_LEAF

-> 328 default: not_reached();

329 		}
330 	#undef RTREE_GET_BIAS
331 		not_reached();

The build did not get to the point of saying the buildkernel
had started, only that buildworld had completed.

It looks like trying again will simply continue on like
last time. So I'm trying that, including buildworld even
though it should be a large no-op before buildkernel
starts. [Later: And if fact buildkernel is now in progress
after that big no-op for buildworld.]

The failures do look systematic. But I'm not familiar
with what jemalloc is doing in the failure area.

I'll also note that this is far better than my past
buildworld buildkernel attempts. (Although I had
given up on buildworld buildkernel some time ago
and only recently retried the pine64, my only
amd64 context.)

That last buildworld buildkernel completed just
fine (really only building the kernel since world
had completed before).

I have not found anything that I can trace back
to this change to fork_trampoline. This change
seems to be working fine.

But at this point I've no clue about what is
specifically at issue for the 3 __je_rtree_get
failures that happened during the testing.

This revision was automatically updated to reflect the committed changes.

I got a panic that involved fork_trampoline in the back trace.
I've no clue if it is related.

a10_mmc0: controller timeout
a10_mmc0: mmcsd0: Spurious interrupt - no active request, rint: 0x00000000
Error indicated: 1 Timeout
panic: vm_fault_hold: fault on nofault entry, addr: 0xffff000078ffc000
cpuid = 0
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28

pc = 0xffff00000060ebfc  lr = 0xffff000000085e00
sp = 0xffff0000839f40c0  fp = 0xffff0000839f42d0

db_trace_self_wrapper() at vpanic+0x150

pc = 0xffff000000085e00  lr = 0xffff00000031994c
sp = 0xffff0000839f42e0  fp = 0xffff0000839f4350

vpanic() at panic+0x4c

pc = 0xffff00000031994c  lr = 0xffff0000003197f8
sp = 0xffff0000839f4360  fp = 0xffff0000839f43e0

panic() at vm_fault_hold+0x1d7c

pc = 0xffff0000003197f8  lr = 0xffff0000005c2fd8
sp = 0xffff00008540

vm_fault_hold() at vm_fault+0x70

pc = 0xffff0000005c2fd8  lr = 0xffff0000005c120c
sp = 0xffff0000839f4550  fp = 0xffff0000839f4580

vm_fault() at data_abort+0xdc

pc = 0xffff0000005c120c  lr = 0xffff000000623ed8
sp = 0xffff0000839f4590  fp = 0xffff0000839f4640

data_abort() at handle_el1h_sync+0x70

pc = 0xffff000000623ed8  lr = 0xffff000000610870
sp = 0xffff0000839f4650  fp = 0xffff0000839f4760

handle_el1h_sync() at bounce_bus_dmamap_sync+0x1f0

pc = 0xffff000000610870  lr = 0xffff00000060d6cc
sp = 0xffff0000839f4770  fp = 0xffff0000839f4850

bounce_bus_dmamap_sync() at a10_mmc_request+0x2b4

pc = 0xffff00000060d6cc  lr = 0xffff0000005ecb24
sp = 0xffff0000839f4860  fp = 0xffff0000839f4890

a10_mmc_request() at mmc_wait_for_req+0xd4

pc = 0xffff0000005ecb24  lr = 0xffff00000014f318
sp = 0xffff0000839f48a0  fp = 0xffff0000839f48d0

mmc_wait_for_req() at mmc_wait_for_request+0x18

pc = 0xffff00000014f318  lr = 0xffff00000014bab4
sp = 0xffff0000839f48e0  fp = 0xffff0000839f48f0

mmc_wait_for_request() at mmcsd_rw+0x260

pc = 0xffff00000014bab4  lr = 0xffff000000151038
sp = 0xffff0000839f4900  fp = 0xffff0000839f4a30

mmcsd_rw() at mmcsd_task+0x230

pc = 0xffff000000151038  lr = 0xffff000000150844
sp = 0xffff0000839f4a40  fp = 0xffff0000839f4b50

mmcsd_task() at fork_exit+0x94

pc = 0xffff000000150844  lr = 0xffff0000002dc4d4
sp = 0xffff0000839f4b60  fp = 0xffff0000839f4b90

fork_exit() at fork_trampoline+0x10

         pc = 0xffff0000002dc4d4  lr = 0xffff000000623a84
         sp = 0xffff0000839f4ba0  fp = 0x0000000000000000
 
		 db>

Note: The "truncated" sp/fp line for "panic() at . . ."
is from the serial console tending to drop text.

The "540" part of the partial sp is not necessarily
to be believed as being for the character columns
where they are shown.

I actually had 2 partially messed up tracebacks
in the log (messed up over different ranges) and
the other one shows for the line in question:

sp = 0xffff0000839f43f0 fp = 0xffff0000839f4540