- User Since
- Jun 4 2014, 6:42 AM (189 w, 1 d)
Another report with a slightly more relevant stack trace.
Turns out that there is another interesting bit, NbMcaToMstCpuEn (NB machine check errors to master CPU only).
MCA: Bank 4, Status 0xbe082000b5080823 MCA: Global Cap 0x0000000000000106, Status 0x0000000000000004 MCA: Vendor "AuthenticAMD", ID 0x100f43, APIC ID 0 MCA: CPU 0 UNCOR PCC BUSLG Source WR Memory MCA: Address 0x37284000 MCA: Misc 0xc01b0fff01000000 panic: Unrecoverable machine check exception cpuid = 0 time = 1516285616 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81eaf740 vpanic() at vpanic+0x19c/frame 0xffffffff81eaf7c0 panic() at panic+0x43/frame 0xffffffff81eaf820 mca_intr() at mca_intr+0x9b/frame 0xffffffff81eaf840 mchk_calltrap() at mchk_calltrap+0x8/frame 0xffffffff81eaf840 --- trap 0x1c, rip = 0xffffffff80f60c91, rsp = 0xfffffe0029aa4840, rbp = 0xfffffe0029aa48a0 --- acpi_pcib_read_config() at acpi_pcib_read_config+0x1/frame 0xfffffe0029aa48a0 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x7b/frame 0xfffffe0029aa48e0 sysctl_root() at sysctl_root+0x20e/frame 0xfffffe0029aa4960 userland_sysctl() at userland_sysctl+0x199/frame 0xfffffe0029aa4a10 sys___sysctl() at sys___sysctl+0x5f/frame 0xfffffe0029aa4ac0 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe0029aa4bf0 fast_syscall_common() at fast_syscall_common+0x100/frame 0x7fffffffd1e0 KDB: enter: panic [ thread pid 690 tid 100174 ]
The stack trace is missing sysctl_proc_inject frame, I guess that's that because the MCE was delivered before acpi_pcib_read_config updated rbp.
Tested an AMD using amd_ecc_inject and injecting an uncorrectable ECC DRAM error:
sysctl hw.error_injection.dram_ecc.bit_mask=0x11 sysctl hw.error_injection.dram_ecc.inject=1
Tue, Jan 16
@anish Did you have a chance to look at D13828? Would you like to do that before we go with this solution?
Mon, Jan 15
Sun, Jan 14
Fri, Jan 12
Ignore software emulated LAPIC TPR when checking for a pending vector (SVM only).
Thu, Jan 11
The SVM part looks good to me (including the cache bit manipulations).
Wed, Jan 10
remove a stray change that broke !KTR build
Please see D13828 as well.
Oh, I didn't think about that.
I think that that can cause an extra interrupt latency but not an interrupt loss, but not sure.
Tue, Jan 9
- remove references to AVIC as it disables virtual interrupt injection, so they cannot be used together, actually
- remove more code that was useful only for virtual interrupt injection
Mon, Jan 8
First, as a baseline, I did some tests on AMD/SVM without applying this patch. I could use up to four hardware watchpoints in a guest, seemingly without any problems.
My tests weren't extensive, so I might have missed some problems.
With this patch everything works good too, I do not see any regressions and now I am able to inspect DR registers from the host.
Sat, Jan 6
Fri, Jan 5
Wed, Dec 20
Dec 14 2017
- ENODEV -> EROFS
- better minimum_cmd_size comparison
Dec 12 2017
I don't have anything against this patch, but I would prefer a more universal solution.
There are more of problems involving FreeBSD-specific volume manipulations done in the sync context.
Please see this bug report for some details: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203864
These comments have my analysis:
@imp Warner, do you have any further questions or requests?
Closing this review as now I see many shortcomings in it.
First, 'NO RETRY' is indeed a too specific name and a too specific instruction comparing to 'FAIL FAST'.
Second, suppressing retries could be a [barely] good enough first approximation for certain class of storage devices (spinning HDDs), but for other device types it might be rather wrong.
Third, the naive implementation didn't take into account things like frozen CAM queues, etc.
Other problems pointed out by reviewers.
So, finally, this change wouldn't fly anyway given the opposition.
Dec 7 2017
Is there still an interest in getting this moving forward?
I think that it should be relatively easy to address my comments and get this into a committable shape.
Dec 6 2017
Dec 5 2017
Move mode header querying from scsi_xpt to scsi_da.
Dec 4 2017
It's not a no, but it's push back that this isn't POLA based on what people expect.
I'd rather focus on the convenience of a modern user than on the historic Unix behavior.
This is a proof of concept, needs discussion and refinement.
Dec 1 2017
Nov 30 2017
LGTM. Thank you!
Nov 25 2017
@ing.gila_gmail.com thank you very much for the explanation! It's both interesting and useful. Much appreciated.
Nov 24 2017
BIO_NORETRY has nothing to do with timeouts
@imp a couple of links on what's going on in illumos: