This change adds a trap handler for exception 0x1500, normalizing all
VSX registers and returning.
While this avoids kernel panics due to unknown exceptions, and in about
2 of 3 runs of programs that cause this exception they now work, sometimes
these programs now receive a segmentation fault. So, apparently there is
still something wrong with the handler.
Details
Diff Detail
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Buildable 21430 Build 20752: arc lint + arc unit
Event Timeline
Running this on the Tyan Power8 hosts now. I'll report back in a day or two after we get some poudriere runs under our belt.
Oof. I got a different panic, that may or may not be useful.
panic: Memory modified after free 0xc000000095a0dda0(32) val=0 @ 0xc000000095a0dda0 cpuid = 31 time = 1542485903 KDB: stack backtrace: 0xe000000090207030: at .kdb_backtrace+0x5c 0xe000000090207160: at .vpanic+0x1b4 0xe000000090207220: at .panic+0x38 0xe0000000902072b0: at .trash_ctor+0x58 0xe000000090207330: at .trash_fini+0x1c 0xe0000000902073b0: at .uma_zdestroy+0x164 0xe000000090207460: at .uma_zdestroy+0x42c 0xe0000000902074f0: at .sys_swapoff+0x2c4 0xe000000090207580: at .uma_zfree_pcpu_arg+0x340 0xe000000090207610: at .zone_drain+0x18 0xe000000090207690: at .uma_avail+0x4c4 0xe000000090207720: at .zone_drain+0x410 0xe0000000902077b0: at .uma_reclaim_worker+0x20c 0xe000000090207860: at .fork_exit+0xd0 0xe000000090207900: at .fork_trampoline+0x10 0xe000000090207930: at -0x4 KDB: enter: panic [ thread pid 15 tid 100219 ]
The reason of segmentation fault when treating the 0x1500 exception
was because it was being treated as a non-hypervisor trap, which
consider the registers srr0/srr1. As this trap is an hypervisor one,
the values were being placed in hsrr0/hsrr1.
Just changed the enter point for hypertrap, which saves the value of
hsrr0 and hsrr1 in srr0 and srr1.
sys/powerpc/include/trap.h | ||
---|---|---|
105 | Whitespace here (nitpick) Tab after \#define, not space. |
Didn't take too long this morning, to crash:
panic: Memory modified after free 0xc0000001843ea8c0(32) val=0 @ 0xc0000001843ea8c0 Tracing pid 21731 tid 101169 td 0xc000000021740000 0xe0000000c647c060: at .vpanic+0x1d4 0xe0000000c647c120: at .panic+0x38 0xe0000000c647c1b0: at .trash_ctor+0x58 0xe0000000c647c230: at .uma_zalloc_arg+0x1f0 0xe0000000c647c2f0: at .uma_zalloc_pcpu_arg+0x174 0xe0000000c647c390: at .uma_zalloc_arg+0x51c 0xe0000000c647c450: at .malloc+0xc4 0xe0000000c647c500: at .zfs_kmem_alloc+0x1c 0xe0000000c647c580: at .zio_data_buf_alloc+0x88 0xe0000000c647c610: at .arc_space_consume+0x710 0xe0000000c647c6b0: at .arc_buf_access+0x7c0 0xe0000000c647c760: at .arc_alloc_buf+0xbc 0xe0000000c647c800: at .dbuf_read+0x5c8 0xe0000000c647c920: at .dmu_tx_dirty_buf+0x4ac 0xe0000000c647c9d0: at .dmu_tx_hold_sa_create+0x240 0xe0000000c647ca80: at .dmu_tx_hold_write+0xec 0xe0000000c647cb20: at .zfs_get_data+0x4c28 0xe0000000c647cdb0: at .VOP_WRITE_APV+0x1b8 0xe0000000c647cf20: at .vn_open+0x7fc 0xe0000000c647d010: at .vn_utimes_perm+0x15c 0xe0000000c647d0e0: at .vn_utimes_perm+0x3c4 0xe0000000c647d2a0: at .vn_utimes_perm+0x79c 0xe0000000c647d380: at .selrecord+0x644 0xe0000000c647d440: at .kern_writev+0x60 0xe0000000c647d4f0: at .sys_write+0x78 0xe0000000c647d5c0: at .trap+0x664 0xe0000000c647d780: at .powerpc_interrupt+0x290 0xe0000000c647d820: user SC trap by 0x104733e8: srr1=0x900000000000f032 r1=0x3fffffffffff2a80 cr=0x28222042 xer=0x20000000 ctr=0x101ddcd4 r2=0x10660ef0
This panic is fairly reliable on our Power8 box. Is it related to this change or is it unrelated? I'm not convinced this panic should keep this review blocked as it resolves other things.
panic: Memory modified after free 0xc0000000850584a0(32) val=0 @ 0xc0000000850584a0 cpuid = 31 time = 1544381394 KDB: stack backtrace: 0xe000000090207030: at .kdb_backtrace+0x5c 0xe000000090207160: at .vpanic+0x1b4 0xe000000090207220: at .panic+0x38 0xe0000000902072b0: at .trash_ctor+0x58 0xe000000090207330: at .trash_fini+0x1c 0xe0000000902073b0: at .uma_zdestroy+0x164 0xe000000090207460: at .uma_zdestroy+0x42c 0xe0000000902074f0: at .sys_swapoff+0x2c4 0xe000000090207580: at .uma_zfree_pcpu_arg+0x340 0xe000000090207610: at .zone_drain+0x18 0xe000000090207690: at .uma_avail+0x4c4 0xe000000090207720: at .zone_drain+0x410 0xe0000000902077b0: at .uma_reclaim_worker+0x20c 0xe000000090207860: at .fork_exit+0xd0 0xe000000090207900: at .fork_trampoline+0x10 0xe000000090207930: at -0x4 KDB: enter: panic
FreeBSD pylon.nyi.freebsd.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r341766M: Sun Dec 9 18:59:51 UTC 2018 sbruno@build-13.freebsd.org:/usr/obj/powerpc.powerpc64/usr/src/sys/CLUSTER13 powerpc
I've updated pylon (tyan p8) to the top of tree and applied this diff. I'm running a full build so lets see if we still have this problem or if its fixed.
panic: Memory modified after free 0xc0000000e2f31580(32) val=0 @ 0xc0000000e2f31580 cpuid = 28 time = 1544388155 KDB: stack backtrace: 0xe0000000c9457e10: at .kdb_backtrace+0x5c 0xe0000000c9457f40: at .vpanic+0x1b4 0xe0000000c9458000: at .panic+0x38 0xe0000000c9458090: at .trash_ctor+0x58 0xe0000000c9458110: at .uma_zalloc_arg+0x1f0 0xe0000000c94581d0: at .uma_zalloc_pcpu_arg+0x174 0xe0000000c9458270: at .uma_zalloc_arg+0x4d0 0xe0000000c9458330: at .malloc+0xc4 0xe0000000c94583e0: at .zfs_kmem_alloc+0x1c 0xe0000000c9458460: at .zio_buf_alloc+0xac 0xe0000000c94584f0: at .arc_space_consume+0x6e4 0xe0000000c9458590: at .arc_buf_access+0x1028 0xe0000000c9458640: at .arc_read+0x5bc 0xe0000000c9458770: at .dbuf_read+0xbbc 0xe0000000c9458890: at .dnode_hold_impl+0x48c 0xe0000000c94589b0: at .dnode_hold+0x24 0xe0000000c9458a30: at .dmu_bonus_hold+0x40 0xe0000000c9458ae0: at .sa_buf_hold+0x14 0xe0000000c9458b60: at .zfs_rezget+0x124 0xe0000000c9458db0: at .zfs_resume_fs+0x194 0xe0000000c9458e70: at .getzfsvfs+0x3b4 0xe0000000c9458f20: at .zfs_secpolicy_share+0x92c 0xe0000000c9459030: at .fiodgname_buf_get_ptr+0x434 0xe0000000c94590f0: at .VOP_IOCTL_APV+0x148 0xe0000000c9459180: at .vn_open+0x2f8 0xe0000000c9459310: at .devfs_unmount_final+0x5f4 0xe0000000c94593a0: at .kern_ioctl+0x31c 0xe0000000c9459470: at .sys_ioctl+0x16c 0xe0000000c94595b0: at .trap+0x664 0xe0000000c9459770: at .powerpc_interrupt+0x290 0xe0000000c9459810: user SC trap by 0x81035a568: srr1=0x900000000000f032 r1=0x3fffffffffffb430 cr=0x24004028 xer=0 ctr=0x81035a560 r2=0x8103b1e00
This looks fairly reliable. So, I'm stopping the attempts to build pkgs.
This panic is almost certainly not related to this change.
It was just that, @leonardo.bianconi_eldorado.org.br and I thought that maybe the fix for the exception 0x1500 issue could also fix the "memory modified after free" issue, but, as your tests have shown, this is not the case.
As this change now fully fixes the exception 0x1500 issue, I vote for checking this in and investigate and fix the "memory modified after free" issue in a separate change.
sys/powerpc/include/trap.h | ||
---|---|---|
105 | Changed before committing. |