- User Since
- Jun 4 2014, 6:42 AM (211 w, 4 d)
Fri, Jun 22
By the way, here is a "purified" variant of the interrupt handling code without all the complications of the interrupt threads, software interrupts, interactions with other subsystems, etc.
This is purely a list of handlers that can be iterated in a wait-free manner.
I think that it could be used, for example, for installing NMI handlers in a more regular fashion than what we have now.
I hope that that code makes the idea behind this change a little bit more clear.
Thu, Jun 21
Rework the proposed change.
Wed, Jun 20
Tue, Jun 19
Fri, Jun 15
Oops, I thought I approved this, but apparently it was "only" on IRC.
I hope that your commit message will not contain "assume" and instead will state that we indeed pass a pointer via means of uintmax_t (that must not be narrower than uintptr_t) as we discussed.
Well, we haven't seen any problems from this code. Except for that oddball system you mentioned earlier.
Not sure if this much churn is really warranted.
Tue, Jun 12
Mon, Jun 11
fix inline documentation of suspend_intr and resume_intr bus methods
I will appreciate any comments and suggestions on the overall approach, the interrupt code changes and the bus code changes (their design, API extension).
I suspect that there can be a more elegant way to get the desired functionality.
I've just posted D15755.
Fri, Jun 8
Thank you for pointing out those problems.
I have an alternative WIP where I added support for marking interrupts as suspended.
Not sure if I did that right. I'll post a review on Monday so that problems with it can be pointed out and better ideas could be suggested.
In that work, I also suspend interrupts only for PCI devices and only legacy interrupts.
Interrupts are suspended and resumed in pci_suspend_child and pci_resume_child.
I also have a review request for a new kind of NMI watchdog, D15630.
In that case I do invoke a new callback to check whether the watchdog recognizes the NMI as caused by its hardware.
Not sure if that would be an overkill in this case.
Thu, Jun 7
Please consider opening an issue at https://www.illumos.org/projects/illumos-gate/issues even though illumos does not support PowerPC.
Tue, Jun 5
Sun, Jun 3
The change has actually been committed, see rS334479.
Fri, Jun 1
Thu, May 31
I guess I can split up the patch. Maybe when committing it, if that ever happens.
But while the change consists of several logical parts that can be isolated, each is useless without others.
Only the small improvements in sys/x86/x86/cpu_machdep.c (extending sysctl descriptions, etc) are completely independent of the rest of the changes.
We tested the earlier version of this patch and didn't get any regression.
We also didn't see any recurrence of the problem, but it was very rare without the patch too.
Wed, May 30
Tue, May 29
May 25 2018
Just a quick question, is this the same change that was recently presented at OpenZFS summit (http://open-zfs.org/w/images/a/a0/Saso_-_resilver_update.pdf)?
Or an alternative to it?
May 24 2018
Thanks a lot!
Unclear if updates to independent cores need to be serialized. Seems harmless — upper bound of 100M cycles = 50 milliseconds at 2GHz. Even on 32-core Epyc that's max 1.6 seconds to load all cores, with a low-clock part.
The memory pointed to can't cause a page or segmentation fault, but obviously that shouldn't happen for any valid kernel memory anyway.
We already check MSR in userland (in cpucontrol), but treat it as a global.
In any case, all three alternatives are fine by me.
Although, my (weak) preference is to use the topology information.
@markj something like that, yes.
I am not sure if logical_cpus_mask is accurate on Ryzen, but I think that we collect enough topology information to either make it correct or to provide another way of checking whether a CPU is a "primary" thread of a core.
I think that it should be relatively easy to instruct threads to do either an update or just a spin.
Having said that, I won't object against this change if it fixes a problem that has been experienced.
I am curious if both (all) hardware threads within a core need to do the update or if it would be sufficient for just one thread per core to do it.
May 21 2018
The change looks good to me. A couple of small comments inline.
May 20 2018
remove a change not intended for comitting
May 17 2018
May 16 2018
- restore formatting of a block that has no functional changes
- re-arrange the code that handles attribute errors per Rick's suggestion
fix bugs pointed out by the reviewers
I agree. Although, in practice, it seems that at present the timer devices (on x86, at least) don't have much, if any, state to save or restore.
It's sufficient that the resume code puts them in workable state and then the clocksource code just programs them for the next event.
May 15 2018
place 'handled' under KDB, fix a whitespace issue near #ifdef