Dec 5 2017
Errr, my cleanup is runtime, not ddb time, sorry for the confusion. I've not grabbed this code.
I've cleaned up this code somewhat, but not completely. I need to reconnect it to an interrupt and then test it in machines that generate these errors...
Jun 30 2017
Feb 11 2017
Some more comments based on first trying to use this code, and then trying to rework it.
While the code is decently written, there's a number of fundamental assumptions it makes that aren't reflective of the AER in the PCIe spec that require changes.
I've not looked at all at the userland test suite, so have no opinion on that.
Feb 6 2017
After talking with John, these two reports are self-consistent. The bridge see one kind of thing going on (timeout) while the card sees something else (bad header so ignored).
pciconf -bBlaec pcib5
pcib5@pci0:0:2:2: class=0x060400 card=0x083315d9 chip=0x6f068086 rev=0x01 hdr=0x01
bus range = 4-4 window[1c] = type I/O Port, range 16, addr 0xf000-0xfff, disabled window = type Memory, range 32, addr 0xfb500000-0xfb5fffff, enabled window = type Prefetchable Memory, range 64, addr 0xfff00000-0xfffff, disabled cap 0d = PCI Bridge card=0x083315d9 cap 05 = MSI supports 2 messages, vector masks cap 10 = PCI-Express 2 root port max data 256(256) ARI disabled link x4(x4) speed 2.5(8.0) slot 0 power limit 25000 mW surprise cap 01[e0] = powerspec 3 supports D0 D3 current D0 ecap 000b = Vendor 1 ID 2 ecap 000d = ACS 1 ecap 0001 = AER 1 0 fatal 1 non-fatal 2 corrected ecap 000b[1d0] = Vendor 1 ID 3 ecap 0019 = PCIe Sec 1 lane errors 0 ecap 000b = Vendor 1 ID 5 ecap 000b = Vendor 1 ID 8 PCI-e errors = Correctable Error Detected Non-Fatal Error Detected Non-fatal = Completion Timeout Corrected = Replay Timer Timeout Advisory Non-Fatal Error
For the commit, I'd suggest doing the injection tool (both userland and kernel) separate from the kernel changes to pcib.