Page MenuHomeFreeBSD

riscv plic: Do not complete interrupts until the interrupt handler has run
ClosedPublic

Authored by kp on Jul 1 2020, 9:28 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 11, 5:51 PM
Unknown Object (File)
Fri, Jan 3, 3:16 PM
Unknown Object (File)
Dec 17 2024, 11:14 PM
Unknown Object (File)
Dec 9 2024, 1:55 PM
Unknown Object (File)
Nov 18 2024, 11:27 PM
Unknown Object (File)
Oct 24 2024, 9:00 AM
Unknown Object (File)
Oct 14 2024, 2:12 PM
Unknown Object (File)
Oct 13 2024, 2:51 AM

Details

Summary

We cannot complete the interrupt (i.e. write to the claims/complete
register until the interrupt handler has actually run. We don't run the
interrupt handler immediately from intr_isrc_dispatch(), we only
schedule it for later execution.

If we immediately complete it (i.e. before the interrupt handler proper
has run) the interrupt may be triggered again if the interrupt source
remains set. From RISC-V Instruction Set Manual: Volume II: Priviliged
Architecture, 7.4 Interrupt Gateways:

"If a level-sensitive interrupt source deasserts the interrupt after the
PLIC core accepts the request and before the interrupt is serviced, the
interrupt request remains present in the IP bit of the PLIC core and
will be serviced by a handler, which will then have to determine that
the interrupt device no longer requires service."

In other words, we may receive interrupts twice.

Avoid that by postponing the completion until after the interrupt
handler has run.

If the interrupt is handled by a filter rather than by scheduling an
interrupt thread we must also complete the interrupt, so set up a
post_filter handler (which is the same as the post_ithread handler).

Sponsored by: Axiado

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kp requested review of this revision.Jul 1 2020, 9:28 AM
kp created this revision.

Are there not issues with a device driver disabling interrupts in its handler (either synchronous or asynchronous) for whatever reason and then that being clobbered by the plic_post_filter method? That already happened with pic_post_ithread... But we shouldn't need to disable/enable any more at all? An interrupt is suppressed the entire time we have it claimed, I assume the disable/enable code is there so we don't keep getting interrupts when using an ithread.

Some more comments:

  1. Is there any guarantee pic_post_ithread gets run on the same hart? Enabling/disabling can occur from any hart as that's just configuring the _source_ priority, but claiming/completing is per-hart.
  1. The MIPS PIC driver does what the PLIC driver currently does. But there are other PIC drivers in the tree that condition when to do an EOI based on the type of interrupt source. I assume we can't get that information about the platform and so have to pessimistically assume everything could be level-triggered?

Are there not issues with a device driver disabling interrupts in its handler (either synchronous or asynchronous) for whatever reason and then that being clobbered by the plic_post_filter method? That already happened with pic_post_ithread... But we shouldn't need to disable/enable any more at all? An interrupt is suppressed the entire time we have it claimed, I assume the disable/enable code is there so we don't keep getting interrupts when using an ithread.

I'm not sure I follow. Device drivers can presumably enable/disable their own interrupt as they see fit, but they don't enable/disable interrupts in the PLIC.

I believe I've tested not disabling the interrupt, and kept trapping back into plic_intr(), so I don't think that claiming an interrupt actually suppresses it.

Is there any guarantee pic_post_ithread gets run on the same hart? Enabling/disabling can occur from any hart as that's just configuring the _source_ priority, but claiming/completing is per-hart.

I don't believe that there is such a guarantee, but plic_post_ithread() explicitly looks at what CPU the interrupt was supposed to go to to claim for the correct hart.

The MIPS PIC driver does what the PLIC driver currently does. But there are other PIC drivers in the tree that condition when to do an EOI based on the type of interrupt source. I assume we can't get that information about the platform and so have to pessimistically assume everything could be level-triggered?

I'd think so, yes.

Based on what the PLIC spec describes, reading/writing from the claim register is what distinguishes a claim and a completion. You are right that we should wait until after the ithread has done its work to signal completion of the interrupt.

Presumably you observed receiving the interrupt twice?

sys/riscv/riscv/plic.c
171 ↗(On Diff #73953)

It would be useful to add a comment here clarifying that reading from the claim register is how we perform the claim.

394 ↗(On Diff #73953)

Could use a comment here to clarify that by writing to the claim register we are signaling that it has been "completed".

462 ↗(On Diff #73953)

Is this right? Registering plic_post_ithread twice means we will signal completion of the interrupt twice.

it looks like linux does similar thing?
See kernel/irq/chip.c: void unmask_threaded_irq()

sys/riscv/riscv/plic.c
462 ↗(On Diff #73953)

Never mind, I had to read the very last paragraph of your commit description again.

Based on what the PLIC spec describes, reading/writing from the claim register is what distinguishes a claim and a completion. You are right that we should wait until after the ithread has done its work to signal completion of the interrupt.

Presumably you observed receiving the interrupt twice?

Yes, for some devices. We didn't see it for the UART, but that's because the UART installs a filter. It does handle its interrupt between the claim and completion.

FWIW, the general strategy on x86 and other platforms is to mask the interrupt in the PIC in the "pre_ithread" hook and later re-enable it in the PIC again in the "post_ithread" hook. Both "pre_ithread" and "post_filter" send an EOI as they are the only ones guaranteed to run synchronously on the CPU which received the interrupt. MSI alleviates the need for masking on most platforms since it is effectively edge triggered so actual level-triggered interrupts are rare on many modern systems. I'm not quite sure how that model translates to the PLIC, but on the surface reading the description of this commit seems to imply it is not doing the same thing at all. The reason we did this on x86 is that the EOI was global and could not be deferred to post_ithread. If you don't have a global EOI but instead are free to get additional interrupts (including additional lower priority interrupts) while the interrupt is not claimed, then I think your commit will work fine. However, if a high priority interrupt (PLIC priority) blocks other interrupts then you are letting that ithread starve the filters, etc. for other devices which is probably not ideal.

In D25531#564563, @jhb wrote:

FWIW, the general strategy on x86 and other platforms is to mask the interrupt in the PIC in the "pre_ithread" hook and later re-enable it in the PIC again in the "post_ithread" hook. Both "pre_ithread" and "post_filter" send an EOI as they are the only ones guaranteed to run synchronously on the CPU which received the interrupt. MSI alleviates the need for masking on most platforms since it is effectively edge triggered so actual level-triggered interrupts are rare on many modern systems. I'm not quite sure how that model translates to the PLIC, but on the surface reading the description of this commit seems to imply it is not doing the same thing at all. The reason we did this on x86 is that the EOI was global and could not be deferred to post_ithread. If you don't have a global EOI but instead are free to get additional interrupts (including additional lower priority interrupts) while the interrupt is not claimed, then I think your commit will work fine. However, if a high priority interrupt (PLIC priority) blocks other interrupts then you are letting that ithread starve the filters, etc. for other devices which is probably not ideal.

That's what the driver is currently doing in plic_pre_ithread and plic_post_ithread. The spec sucks and doesn't really say what claim/complete actually do, but at least the QEMU and Bluespec implementations of the PLIC will clear the IP bit and forbid the IP bit going high until completed, effectively masking the interrupt for _every_ target by masking on the _source_.

In D25531#564563, @jhb wrote:

FWIW, the general strategy on x86 and other platforms is to mask the interrupt in the PIC in the "pre_ithread" hook and later re-enable it in the PIC again in the "post_ithread" hook. Both "pre_ithread" and "post_filter" send an EOI as they are the only ones guaranteed to run synchronously on the CPU which received the interrupt.

If I understood that correctly that's pretty much what we're doing here in the plic code. We use the priority of the interrupt (because that's global across CPUs) to mask it while it's pending processing by the ithread. Post ithread we re-enable it.

If you don't have a global EOI but instead are free to get additional interrupts (including additional lower priority interrupts) while the interrupt is not claimed, then I think your commit will work fine. However, if a high priority interrupt (PLIC priority) blocks other interrupts then you are letting that ithread starve the filters, etc. for other devices which is probably not ideal.

We can get other interrupts while this one is queued, yes.

That's what the driver is currently doing in plic_pre_ithread and plic_post_ithread. The spec sucks and doesn't really say what claim/complete actually do, but at least the QEMU and Bluespec implementations of the PLIC will clear the IP bit and forbid the IP bit going high until completed, effectively masking the interrupt for _every_ target by masking on the _source_.

That reminds me: the qemu implementation has a bug that means it won't trigger interrupts when we re-enable them, leading to stalls (until you trigger another interrupt, e.g. by sending something on the console).

The following Qemu patch fixes that:

--- hw/riscv/sifive_plic.c.orig	2019-11-14 19:06:20.000000000 +0100
+++ hw/riscv/sifive_plic.c	2020-07-01 15:59:38.882119000 +0200
@@ -290,6 +290,7 @@
             qemu_log("plic: write priority: irq=%d priority=%d\n",
                 irq, plic->source_priority[irq]);
         }
+        sifive_plic_update(plic);
         return;
     } else if (addr >= plic->pending_base && /* 1 bit per source */
                addr < plic->pending_base + (plic->num_sources >> 3))

Add comments requested by mhorne

In D25531#564579, @kp wrote:

That's what the driver is currently doing in plic_pre_ithread and plic_post_ithread. The spec sucks and doesn't really say what claim/complete actually do, but at least the QEMU and Bluespec implementations of the PLIC will clear the IP bit and forbid the IP bit going high until completed, effectively masking the interrupt for _every_ target by masking on the _source_.

That reminds me: the qemu implementation has a bug that means it won't trigger interrupts when we re-enable them, leading to stalls (until you trigger another interrupt, e.g. by sending something on the console).

The following Qemu patch fixes that:

--- hw/riscv/sifive_plic.c.orig	2019-11-14 19:06:20.000000000 +0100
+++ hw/riscv/sifive_plic.c	2020-07-01 15:59:38.882119000 +0200
@@ -290,6 +290,7 @@
             qemu_log("plic: write priority: irq=%d priority=%d\n",
                 irq, plic->source_priority[irq]);
         }
+        sifive_plic_update(plic);
         return;
     } else if (addr >= plic->pending_base && /* 1 bit per source */
                addr < plic->pending_base + (plic->num_sources >> 3))

Ahh, I have experienced this. It might be responsible for the deadlocks @trasz has seen running the test suite in CI. Has this patch been submitted upstream?

In D25531#564579, @kp wrote:

That's what the driver is currently doing in plic_pre_ithread and plic_post_ithread. The spec sucks and doesn't really say what claim/complete actually do, but at least the QEMU and Bluespec implementations of the PLIC will clear the IP bit and forbid the IP bit going high until completed, effectively masking the interrupt for _every_ target by masking on the _source_.

That reminds me: the qemu implementation has a bug that means it won't trigger interrupts when we re-enable them, leading to stalls (until you trigger another interrupt, e.g. by sending something on the console).

The following Qemu patch fixes that:

--- hw/riscv/sifive_plic.c.orig	2019-11-14 19:06:20.000000000 +0100
+++ hw/riscv/sifive_plic.c	2020-07-01 15:59:38.882119000 +0200
@@ -290,6 +290,7 @@
             qemu_log("plic: write priority: irq=%d priority=%d\n",
                 irq, plic->source_priority[irq]);
         }
+        sifive_plic_update(plic);
         return;
     } else if (addr >= plic->pending_base && /* 1 bit per source */
                addr < plic->pending_base + (plic->num_sources >> 3))

Ahh, I have experienced this. It might be responsible for the deadlocks @trasz has seen running the test suite in CI. Has this patch been submitted upstream?

Yes, I independently discovered that and a bunch of other issues, and submitted upstream reviews that have been merged at least into the RISC-V branch, and possibly also mainline by now.

In D25531#564579, @kp wrote:

That reminds me: the qemu implementation has a bug that means it won't trigger interrupts when we re-enable them, leading to stalls (until you trigger another interrupt, e.g. by sending something on the console).

The following Qemu patch fixes that:

It appears that Qemu is aware of this, and a patch is either already in, or on its way in: https://lists.nongnu.org/archive/html/qemu-riscv/2020-06/msg00304.html
And upon re-reading that e-mail, it looks like you may already be aware of this too...

In D25531#564563, @jhb wrote:

FWIW, the general strategy on x86 and other platforms is to mask the interrupt in the PIC in the "pre_ithread" hook and later re-enable it in the PIC again in the "post_ithread" hook. Both "pre_ithread" and "post_filter" send an EOI as they are the only ones guaranteed to run synchronously on the CPU which received the interrupt. MSI alleviates the need for masking on most platforms since it is effectively edge triggered so actual level-triggered interrupts are rare on many modern systems. I'm not quite sure how that model translates to the PLIC, but on the surface reading the description of this commit seems to imply it is not doing the same thing at all. The reason we did this on x86 is that the EOI was global and could not be deferred to post_ithread. If you don't have a global EOI but instead are free to get additional interrupts (including additional lower priority interrupts) while the interrupt is not claimed, then I think your commit will work fine. However, if a high priority interrupt (PLIC priority) blocks other interrupts then you are letting that ithread starve the filters, etc. for other devices which is probably not ideal.

That's what the driver is currently doing in plic_pre_ithread and plic_post_ithread. The spec sucks and doesn't really say what claim/complete actually do, but at least the QEMU and Bluespec implementations of the PLIC will clear the IP bit and forbid the IP bit going high until completed, effectively masking the interrupt for _every_ target by masking on the _source_.

Also, are we guaranteed to have the *post_ithread* hook run on the same hart? If not we'll complete for the wrong hart (though in practice all implementations I know of only track claims on a global basis so don't care which hart's claim/complete register you write to).

Also, are we guaranteed to have the *post_ithread* hook run on the same hart? If not we'll complete for the wrong hart (though in practice all implementations I know of only track claims on a global basis so don't care which hart's claim/complete register you write to).

I don't think that's guaranteed, but it shouldn't matter. We don't complete for the current CPU, but for the CPU the interrupt is bound to:

        cpu = CPU_FFS(&isrc->isrc_cpu) - 1;

	/* Complete the interrupt. */
	WR4(sc, PLIC_CLAIM(sc, cpu), src->irq);

I've re-read the PLIC spec, the discussions here, and did some spelunking in the MI interrupt code. As far as I can see this change is correct.

I will see about getting the PLIC spec clarified in terms of the exact behaviour of claim/complete.

My remaining question is this: if the IP bit is cleared by the claim, and cannot be set again until the completion has been done for that context, then is there any reason to continue clearing/setting the enable bit for the interrupt we are servicing? Doesn't the claim/completion process provide the correct behaviour on its own?

This revision is now accepted and ready to land.Jul 3 2020, 8:43 PM

I've re-read the PLIC spec, the discussions here, and did some spelunking in the MI interrupt code. As far as I can see this change is correct.

I will see about getting the PLIC spec clarified in terms of the exact behaviour of claim/complete.

My remaining question is this: if the IP bit is cleared by the claim, and cannot be set again until the completion has been done for that context, then is there any reason to continue clearing/setting the enable bit for the interrupt we are servicing? Doesn't the claim/completion process provide the correct behaviour on its own?

Indeed. My assumption is it's there because that's what other PIC drivers in the tree do to deal with asynchronous interrupts, and it currently stops a constant interrupt storm, but is both inadequate to entirely stop spurious interrupts and now unnecessary.

I've re-read the PLIC spec, the discussions here, and did some spelunking in the MI interrupt code. As far as I can see this change is correct.

I will see about getting the PLIC spec clarified in terms of the exact behaviour of claim/complete.

My remaining question is this: if the IP bit is cleared by the claim, and cannot be set again until the completion has been done for that context, then is there any reason to continue clearing/setting the enable bit for the interrupt we are servicing? Doesn't the claim/completion process provide the correct behaviour on its own?

Indeed. My assumption is it's there because that's what other PIC drivers in the tree do to deal with asynchronous interrupts, and it currently stops a constant interrupt storm, but is both inadequate to entirely stop spurious interrupts and now unnecessary.

As the person who added it, I can tell you it is because other PIC drivers did so. This code was (and continues to be) above my pay grade, so there was a lot of inference from how other drivers achieve things. Looking at what GICv3 does...

static void
gic_v3_pre_ithread(device_t dev, struct intr_irqsrc *isrc)
{
	struct gic_v3_irqsrc *gi = (struct gic_v3_irqsrc *)isrc;

	gic_v3_disable_intr(dev, isrc);
	gic_icc_write(EOIR1, gi->gi_irq);
}

...it's obvious that this is what @jhb described, where the interrupt is masked before sending an EOI.

I think if we move forward with this change as-is we can come back to consider removing the enable/disable calls.

After updating to a recent HEAD I discovered I can no longer boot inside QEMU. I know that the aforementioned bugs in the QEMU's PLIC are to blame, but I did not realize that they would cause a complete deadlock with this change applied. CI didn't catch it because the RISC-V testvm job has been broken for some time.

@kp, can you see that qemu-devel gets updated with the required fixes, or back out this change until the port is updated to 5.0? I will make sure that they get included with that update.

These are the required fixes by @jrtc27: aa4d30f and 5576582

After updating to a recent HEAD I discovered I can no longer boot inside QEMU. I know that the aforementioned bugs in the QEMU's PLIC are to blame, but I did not realize that they would cause a complete deadlock with this change applied. CI didn't catch it because the RISC-V testvm job has been broken for some time.

@kp, can you see that qemu-devel gets updated with the required fixes, or back out this change until the port is updated to 5.0? I will make sure that they get included with that update.

These are the required fixes by @jrtc27: aa4d30f and 5576582

I'll try to post patches for the qemu-devel port today, or before Monday at the latest.

In D25531#568832, @kp wrote:

After updating to a recent HEAD I discovered I can no longer boot inside QEMU. I know that the aforementioned bugs in the QEMU's PLIC are to blame, but I did not realize that they would cause a complete deadlock with this change applied. CI didn't catch it because the RISC-V testvm job has been broken for some time.

@kp, can you see that qemu-devel gets updated with the required fixes, or back out this change until the port is updated to 5.0? I will make sure that they get included with that update.

These are the required fixes by @jrtc27: aa4d30f and 5576582

I'll try to post patches for the qemu-devel port today, or before Monday at the latest.

FYI, I ran into this again and just updated qemu-devel to the latest (at the time) snapshot: 5.1.0 rc2 at D25991.

In D25531#576420, @jhb wrote:

FYI, I ran into this again and just updated qemu-devel to the latest (at the time) snapshot: 5.1.0 rc2 at D25991.

Sorry, I should have updated this too. There was an e-mail thread (or IRC convo, I forget) about it. There's been a qemu5 port for a while that also works.