Page MenuHomeFreeBSD

ppc64: handle exception 0x1500 (soft patch)
ClosedPublic

Authored by leandro.lupori_gmail.com on Nov 1 2018, 4:44 PM.

Details

Summary

This change adds a trap handler for exception 0x1500, normalizing all
VSX registers and returning.
While this avoids kernel panics due to unknown exceptions, and in about
2 of 3 runs of programs that cause this exception they now work, sometimes
these programs now receive a segmentation fault. So, apparently there is
still something wrong with the handler.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

swills added a subscriber: swills.Nov 16 2018, 12:50 PM

Running this on the Tyan Power8 hosts now. I'll report back in a day or two after we get some poudriere runs under our belt.

Oof. I got a different panic, that may or may not be useful.

panic: Memory modified after free 0xc000000095a0dda0(32) val=0 @ 0xc000000095a0dda0

cpuid = 31
time = 1542485903
KDB: stack backtrace:
0xe000000090207030: at .kdb_backtrace+0x5c
0xe000000090207160: at .vpanic+0x1b4
0xe000000090207220: at .panic+0x38
0xe0000000902072b0: at .trash_ctor+0x58
0xe000000090207330: at .trash_fini+0x1c
0xe0000000902073b0: at .uma_zdestroy+0x164
0xe000000090207460: at .uma_zdestroy+0x42c
0xe0000000902074f0: at .sys_swapoff+0x2c4
0xe000000090207580: at .uma_zfree_pcpu_arg+0x340
0xe000000090207610: at .zone_drain+0x18
0xe000000090207690: at .uma_avail+0x4c4
0xe000000090207720: at .zone_drain+0x410
0xe0000000902077b0: at .uma_reclaim_worker+0x20c
0xe000000090207860: at .fork_exit+0xd0
0xe000000090207900: at .fork_trampoline+0x10
0xe000000090207930: at -0x4
KDB: enter: panic
[ thread pid 15 tid 100219 ]

The reason of segmentation fault when treating the 0x1500 exception
was because it was being treated as a non-hypervisor trap, which
consider the registers srr0/srr1. As this trap is an hypervisor one,
the values were being placed in hsrr0/hsrr1.
Just changed the enter point for hypertrap, which saves the value of
hsrr0 and hsrr1 in srr0 and srr1.

Great!
Thanks for fixing the issue I left behind on this!

This revision is now accepted and ready to land.Dec 7 2018, 4:28 PM
jhibbits accepted this revision.Dec 7 2018, 4:35 PM
jhibbits added inline comments.
sys/powerpc/include/trap.h
105 ↗(On Diff #51725)

Whitespace here (nitpick) Tab after \#define, not space.

sbruno added a comment.Dec 8 2018, 1:54 PM

The reason of segmentation fault when treating the 0x1500 exception
was because it was being treated as a non-hypervisor trap, which
consider the registers srr0/srr1. As this trap is an hypervisor one,
the values were being placed in hsrr0/hsrr1.
Just changed the enter point for hypertrap, which saves the value of
hsrr0 and hsrr1 in srr0 and srr1.

Didn't take too long this morning, to crash:

panic: Memory modified after free 0xc0000001843ea8c0(32) val=0 @ 0xc0000001843ea8c0

Tracing pid 21731 tid 101169 td 0xc000000021740000
0xe0000000c647c060: at .vpanic+0x1d4
0xe0000000c647c120: at .panic+0x38
0xe0000000c647c1b0: at .trash_ctor+0x58
0xe0000000c647c230: at .uma_zalloc_arg+0x1f0
0xe0000000c647c2f0: at .uma_zalloc_pcpu_arg+0x174
0xe0000000c647c390: at .uma_zalloc_arg+0x51c
0xe0000000c647c450: at .malloc+0xc4
0xe0000000c647c500: at .zfs_kmem_alloc+0x1c
0xe0000000c647c580: at .zio_data_buf_alloc+0x88
0xe0000000c647c610: at .arc_space_consume+0x710
0xe0000000c647c6b0: at .arc_buf_access+0x7c0
0xe0000000c647c760: at .arc_alloc_buf+0xbc
0xe0000000c647c800: at .dbuf_read+0x5c8
0xe0000000c647c920: at .dmu_tx_dirty_buf+0x4ac
0xe0000000c647c9d0: at .dmu_tx_hold_sa_create+0x240
0xe0000000c647ca80: at .dmu_tx_hold_write+0xec
0xe0000000c647cb20: at .zfs_get_data+0x4c28
0xe0000000c647cdb0: at .VOP_WRITE_APV+0x1b8
0xe0000000c647cf20: at .vn_open+0x7fc
0xe0000000c647d010: at .vn_utimes_perm+0x15c
0xe0000000c647d0e0: at .vn_utimes_perm+0x3c4
0xe0000000c647d2a0: at .vn_utimes_perm+0x79c
0xe0000000c647d380: at .selrecord+0x644
0xe0000000c647d440: at .kern_writev+0x60
0xe0000000c647d4f0: at .sys_write+0x78
0xe0000000c647d5c0: at .trap+0x664
0xe0000000c647d780: at .powerpc_interrupt+0x290
0xe0000000c647d820: user SC trap by 0x104733e8: srr1=0x900000000000f032
            r1=0x3fffffffffff2a80 cr=0x28222042 xer=0x20000000 ctr=0x101ddcd4 r2=0x10660ef0
sbruno requested changes to this revision.Dec 9 2018, 6:52 PM

This panic is fairly reliable on our Power8 box. Is it related to this change or is it unrelated? I'm not convinced this panic should keep this review blocked as it resolves other things.

panic: Memory modified after free 0xc0000000850584a0(32) val=0 @ 0xc0000000850584a0

cpuid = 31
time = 1544381394
KDB: stack backtrace:
0xe000000090207030: at .kdb_backtrace+0x5c
0xe000000090207160: at .vpanic+0x1b4
0xe000000090207220: at .panic+0x38
0xe0000000902072b0: at .trash_ctor+0x58
0xe000000090207330: at .trash_fini+0x1c
0xe0000000902073b0: at .uma_zdestroy+0x164
0xe000000090207460: at .uma_zdestroy+0x42c
0xe0000000902074f0: at .sys_swapoff+0x2c4
0xe000000090207580: at .uma_zfree_pcpu_arg+0x340
0xe000000090207610: at .zone_drain+0x18
0xe000000090207690: at .uma_avail+0x4c4
0xe000000090207720: at .zone_drain+0x410
0xe0000000902077b0: at .uma_reclaim_worker+0x20c
0xe000000090207860: at .fork_exit+0xd0
0xe000000090207900: at .fork_trampoline+0x10
0xe000000090207930: at -0x4
KDB: enter: panic
This revision now requires changes to proceed.Dec 9 2018, 6:52 PM
sbruno added a comment.Dec 9 2018, 8:32 PM
FreeBSD pylon.nyi.freebsd.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r341766M: Sun Dec  9 18:59:51 UTC 2018     sbruno@build-13.freebsd.org:/usr/obj/powerpc.powerpc64/usr/src/sys/CLUSTER13  powerpc

I've updated pylon (tyan p8) to the top of tree and applied this diff. I'm running a full build so lets see if we still have this problem or if its fixed.

sbruno added a comment.Dec 9 2018, 9:12 PM
panic: Memory modified after free 0xc0000000e2f31580(32) val=0 @ 0xc0000000e2f31580

cpuid = 28
time = 1544388155
KDB: stack backtrace:
0xe0000000c9457e10: at .kdb_backtrace+0x5c
0xe0000000c9457f40: at .vpanic+0x1b4
0xe0000000c9458000: at .panic+0x38
0xe0000000c9458090: at .trash_ctor+0x58
0xe0000000c9458110: at .uma_zalloc_arg+0x1f0
0xe0000000c94581d0: at .uma_zalloc_pcpu_arg+0x174
0xe0000000c9458270: at .uma_zalloc_arg+0x4d0
0xe0000000c9458330: at .malloc+0xc4
0xe0000000c94583e0: at .zfs_kmem_alloc+0x1c
0xe0000000c9458460: at .zio_buf_alloc+0xac
0xe0000000c94584f0: at .arc_space_consume+0x6e4
0xe0000000c9458590: at .arc_buf_access+0x1028
0xe0000000c9458640: at .arc_read+0x5bc
0xe0000000c9458770: at .dbuf_read+0xbbc
0xe0000000c9458890: at .dnode_hold_impl+0x48c
0xe0000000c94589b0: at .dnode_hold+0x24
0xe0000000c9458a30: at .dmu_bonus_hold+0x40
0xe0000000c9458ae0: at .sa_buf_hold+0x14
0xe0000000c9458b60: at .zfs_rezget+0x124
0xe0000000c9458db0: at .zfs_resume_fs+0x194
0xe0000000c9458e70: at .getzfsvfs+0x3b4
0xe0000000c9458f20: at .zfs_secpolicy_share+0x92c
0xe0000000c9459030: at .fiodgname_buf_get_ptr+0x434
0xe0000000c94590f0: at .VOP_IOCTL_APV+0x148
0xe0000000c9459180: at .vn_open+0x2f8
0xe0000000c9459310: at .devfs_unmount_final+0x5f4
0xe0000000c94593a0: at .kern_ioctl+0x31c
0xe0000000c9459470: at .sys_ioctl+0x16c
0xe0000000c94595b0: at .trap+0x664
0xe0000000c9459770: at .powerpc_interrupt+0x290
0xe0000000c9459810: user SC trap by 0x81035a568: srr1=0x900000000000f032
            r1=0x3fffffffffffb430 cr=0x24004028 xer=0 ctr=0x81035a560 r2=0x8103b1e00

This looks fairly reliable. So, I'm stopping the attempts to build pkgs.

This panic is fairly reliable on our Power8 box. Is it related to this change or is it unrelated? I'm not convinced this panic should keep this review blocked as it resolves other things.

panic: Memory modified after free 0xc0000000850584a0(32) val=0 @ 0xc0000000850584a0
cpuid = 31
time = 1544381394
KDB: stack backtrace:
0xe000000090207030: at .kdb_backtrace+0x5c
0xe000000090207160: at .vpanic+0x1b4
0xe000000090207220: at .panic+0x38
0xe0000000902072b0: at .trash_ctor+0x58
0xe000000090207330: at .trash_fini+0x1c
0xe0000000902073b0: at .uma_zdestroy+0x164
0xe000000090207460: at .uma_zdestroy+0x42c
0xe0000000902074f0: at .sys_swapoff+0x2c4
0xe000000090207580: at .uma_zfree_pcpu_arg+0x340
0xe000000090207610: at .zone_drain+0x18
0xe000000090207690: at .uma_avail+0x4c4
0xe000000090207720: at .zone_drain+0x410
0xe0000000902077b0: at .uma_reclaim_worker+0x20c
0xe000000090207860: at .fork_exit+0xd0
0xe000000090207900: at .fork_trampoline+0x10
0xe000000090207930: at -0x4
KDB: enter: panic

This panic is almost certainly not related to this change.
It was just that, @leonardo.bianconi_eldorado.org.br and I thought that maybe the fix for the exception 0x1500 issue could also fix the "memory modified after free" issue, but, as your tests have shown, this is not the case.

As this change now fully fixes the exception 0x1500 issue, I vote for checking this in and investigate and fix the "memory modified after free" issue in a separate change.

sbruno accepted this revision.Dec 10 2018, 1:30 PM

Agreed. Feel free to commit this at your leisure.

This revision is now accepted and ready to land.Dec 10 2018, 1:30 PM
leandro.lupori_gmail.com marked 2 inline comments as done.Dec 10 2018, 2:38 PM
leandro.lupori_gmail.com added inline comments.
sys/powerpc/include/trap.h
105 ↗(On Diff #51725)

Changed before committing.

This revision was automatically updated to reflect the committed changes.
luporl added a subscriber: luporl.Jan 7 2019, 7:35 PM