Page MenuHomeFreeBSD

x86: Refactor kernel-mode NMI handling
Needs ReviewPublic

Authored by bnovkov on Fri, Aug 23, 4:34 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Sep 10, 9:31 PM
Unknown Object (File)
Sun, Sep 8, 10:03 PM
Unknown Object (File)
Sun, Sep 8, 8:48 PM
Unknown Object (File)
Sun, Sep 8, 8:48 PM
Unknown Object (File)
Sun, Sep 8, 9:50 AM
Unknown Object (File)
Sun, Sep 8, 8:25 AM
Unknown Object (File)
Fri, Sep 6, 10:01 AM
Unknown Object (File)
Thu, Sep 5, 9:03 PM
Subscribers

Details

Reviewers
kib
br
markj
Summary

This refactor aims to add the ability to share performance counter interrupts by refactoring the kernel-mode NMI handler.
The handler now allows multiple drivers to service the same interrupt (e.g. hwpmc(4) and hwt(4)'s Intel Processor Trace backend).

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

sys/amd64/amd64/trap.c
216

Should we return immediately if handled = true here?

223

The "we can return immediately" fragment doesn't really make sense anymore.

bnovkov added inline comments.
sys/amd64/amd64/trap.c
216

I avoided this since it could be possible that a single PC interrupt gets consumed by multiple handlers, at least in theory.
I'm not sure if this situation possible in practice though.

I do not quite like the patch. May I suggest to have a list of nmi handlers, registered dynamically?

I think registration for single-linked list can be done without sync with nmi processing, and de-registration can simply NULL the function pointer in the list element. This way you would avoid a need to sync with (AKA disable) NMI for list manipulation. But then we remove the need to patch trap.c for each NMI consumer.

sys/amd64/amd64/trap.c
214

This and the next line can be joined I believe.

Is HWT option already defined for this patch to land?

225

Again, join the lines.

In D46421#1058638, @kib wrote:

I do not quite like the patch. May I suggest to have a list of nmi handlers, registered dynamically?

I think registration for single-linked list can be done without sync with nmi processing, and de-registration can simply NULL the function pointer in the list element. This way you would avoid a need to sync with (AKA disable) NMI for list manipulation. But then we remove the need to patch trap.c for each NMI consumer.

I like the approach, but after implementing it and thinking about the matter some more I'm not sure how to proceed.
I'd like to hear what you think before updating the diff.

You can find the dynamic registration patch here. It implements the approach you described above, with the addition of using a UMA_ZONE_NOFREE zone for allocating handler entries and masking the PCINT LVT before invoking the handlers.
However, I'm not sure that adding this additional complexity is entirely justified.
Only the PCINT LVT is currently configured to use an NMI, and according to the AMD and Intel manuals the only PCINT consumers (so far) are performance counters and Intel PT.
Adding dynamic kernel-mode NMI handler registration seems a bit overkill when we're dealing with a maximum of two handlers.

I intended this approach to be applied to all nmi handlers, including the nmi_handle_intr(). [This in particular means that toggling of pcint should be inside the handlers]. Also, with some work, parts of the asm code for nmi from exception.S can be moved to C.

Why do you need zone for the handlers list elements? Due to NOFREE? I do not think it is justifiable.

Why do you check for existing handler registration when registering? If anything, the action should be a panic IMO.

With these changes, it is quite small addition, and most important, it makes code much more modular, which I wanted: remove the knowledge of hwpmc and now pt from trap.c.

bnovkov retitled this revision from amd64: Refactor kernel-mode NMI handling to x86: Refactor kernel-mode NMI handling .

Address @kib 's comments:

  • NMI handlers are now dynamically registered
  • Added same changes to i386/trap.c

I've updated the patch and tested this approach by having pmcstat and the PT driver run at the same time. I have not encountered any crashes while testing.

There is, however, a peculiar issue I've encountered while working on this.
If I dynamically registers the ipi_nmi_handler, I get dropped into ddb when running the test described above.
However, as you can see in the patch, everything works fine if ipi_nmi_handler is invoked before the dynamic handlers.
I'm not sure what's causing this issue.

In D46421#1059782, @kib wrote:

I intended this approach to be applied to all nmi handlers, including the nmi_handle_intr(). [This in particular means that toggling of pcint should be inside the handlers]. Also, with some work, parts of the asm code for nmi from exception.S can be moved to C.

With these changes, it is quite small addition, and most important, it makes code much more modular, which I wanted: remove the knowledge of hwpmc and now pt from trap.c.

That makes sense, thanks for clearing things up!

Why do you check for existing handler registration when registering? If anything, the action should be a panic IMO.

I was on the fence about that one, I've changed it to a panic.

Why do you need zone for the handlers list elements? Due to NOFREE? I do not think it is justifiable.

Right, the handler list elements are never deallocated which is why I've created a NOFREE zone.

Please upload diffs with full context, otherwise the phab feature to suggest diff does not work.

I still suggest to not use zone, NOFREE for several elements is hardly a justification for the whole zone. Plain malloc would be fine there.

sys/x86/x86/cpu_machdep.c
896

This could be *nmi_handlers_head. The head pointer does not need to be the struct, and you do not use it as full struct.

908

Indent is wrong. It is worth to check for NULL func using plain atomic_load_ptr() before issuing CAS.

917

If CAS failed, I do not think it would succeed for any later try. You need to re-read nmi_handlers_head.next on failure. See how atomic_fcmpset_ptr() is used to reuse of the load done as part of CAS.

927

Wrong indent.

Why using CAS there and not just plain store if hp->func is equal to handler?

In D46421#1060536, @kib wrote:

Please upload diffs with full context, otherwise the phab feature to suggest diff does not work.

Sorry, don't know how that slipped through. The updated diff has full context.

I still suggest to not use zone, NOFREE for several elements is hardly a justification for the whole zone. Plain malloc would be fine there.

No problem, I've switched to plain malloc.

sys/x86/x86/cpu_machdep.c
917

You are right, atomic_fcmpset_ptr was what I needed, thanks

sys/x86/x86/cpu_machdep.c
905

If you use atomic_load() for accessing hp->func, it should be done consistently I think, i.e. for each load and store.

917

Indent should be +4 spaces

You do not need the loop body if you use atomic_fcmpset(). nmi_handlers_head should be loaded into hp->next once, and then you just do fcmpset() in a while() loop.

Since the patch is almost finished, I looked at the overall structure and I do not see a reason to use atomics for accesses to ->func. Instead, note that CAS is used against ->next, and there we do need atomics, and perhaps store release on assignment, as well as load acquire on traversing.

sys/x86/x86/cpu_machdep.c
907
hpf = atomic_load_ptr(&hp->func);
MPASS(hpf != handler);
if (hpf == NULL &&