Page MenuHomeFreeBSD

ARM64 GICv3 Cache bits
Needs ReviewPublic

Authored by phk on Jan 4 2025, 1:45 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Feb 2, 11:42 AM
Unknown Object (File)
Mon, Jan 27, 9:25 AM
Unknown Object (File)
Fri, Jan 24, 2:13 AM
Unknown Object (File)
Thu, Jan 23, 4:20 PM
Unknown Object (File)
Wed, Jan 22, 10:25 PM
Unknown Object (File)
Wed, Jan 22, 6:31 AM
Unknown Object (File)
Thu, Jan 16, 5:49 PM
Unknown Object (File)
Thu, Jan 16, 5:49 PM
Subscribers

Details

Reviewers
andrew
jrtc27
manu
Summary

This is the minimal subset of D478819 relating only to the cache bits.

Attempt to set the cache bits to GITS_BASER_CACHE_RAWAWB, but respect that sub-field may be read-only.

This works on my T14s Snapdragon machine.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

phk requested review of this revision.Jan 4 2025, 1:45 PM
sys/arm64/arm64/gicv3_its.c
672

I don't understand this

sys/arm64/arm64/gicv3_its.c
672

As I explained in D478819: ARM's spec does not mandate/guarantee that the cache bits can be modified.

Therefore the value we write, be it zero in case of ITS_FLAGS_ERRATA_CAVIUM_22375 or RAWAWB otherwise, may not be what we read back.

QED: We should not ENXIO if the cache bits we read differ from the cache bits we wrote.

sys/arm64/arm64/gicv3_its.c
672

No, Andy’s point is the parentheses are in the wrong place so this is nonsensical.

Changed != to ^ in write-read-back check.

sys/arm64/arm64/gicv3_its.c
672

Ohh, duH! I forgot to change the != to ^.

new patch in a sec.

sys/arm64/arm64/gicv3_its.c
672

Ok, where in the spec do you see that these bits can be read-only? I see that *OuterCache* is documented as potentially fixed, but there is no such note for *InnerCache* in my copy of IHI0069H.

What's the full output of acpidump -dt?

sys/arm64/arm64/gicv3_its.c
672

First and most importantly, I see it on this hardware where I cannot change the value of that field.

The spec doesn't say either way, all it says is that the field will have an "architecturally UNKNOWN value" on reset, but that does not actually promise that you can or should change that value.

For reference, Linux makes no attempt to modify that field, and I honestly dont see why we do ?

For OuterCache they had to explicitly talk about, because they changed their mind and have deprecated that field having a fixed value.

acpidump output at https://phk.freebsd.dk/misc/snapdragon_acpidump.txt (NB: impermanent link)

sys/arm64/arm64/gicv3_its.c
672

Linux does set it to the value it chooses and will fail to attach if that does not persist. https://github.com/torvalds/linux/blob/7f5b6a8ec18e3add4c74682f60b90c31bdf849f2/drivers/irqchip/irq-gic-v3-its.c#L2451-L2475

The implication is normally that all values are usable unless specified otherwise.

sys/arm64/arm64/gicv3_its.c
672

(Re Linux: See the ITS_FLAGS_FORCE_NON_SHAREABLE hack in line 2676, which explicitly triggers the exception in the code you quote.)

And I agree, I would expect the document to warn if a field or register might or might not be R/O or have restricted values.

And yet here we are:

GITS_BASER_CACHE_WAWB is not a value which can be written to that field on this hardware.

So we have evidence that either the documentation is not comprehensive, or there are silicon designers who read it differently than we do.

In either case, the hard failure is a bad idea, this machine would just have booted, if not for our unfounded hard failure.

And mind you, this was not easy to diagnose for me: The crucial printf did not even register on a 120fps camera, and it was not until I artificially slowed the console output down, that I was even able to see it.

I can live with that, I've debugged kernels with a single flashing LED as only output, but I believe in mending such potholethe, because the next unlucky pointer is probably not like me,

But I guess ... whatever... ?

I just need the first chunk of this patch and can we at least agree to commit that ?

I am going to time out this review due to obvious lack of interest on the part of the reviewers and commit it as proposed.

The first hunk should not be necessary according to the specification, but if it helps your platform, does not regress others and is believed to be better anyway according to Andy then that's fine. The second hunk I do not believe should be committed.

I also wrote the following on IRC a couple of weeks back:

<@jrtc27> phk: because (a) I don't think it's right to just ignore errors that the spec reads as not permitting (b) given this is about caching are there not potentially implications for using some random setting that the hardware or firmware decided to use?
<@jrtc27> e.g. it could be something like GITS_CTLR.Enabled is set on entry to the kernel, we clear it, but we don't poll to wait until GITS_CTLR.Quiescent is 1
<@jrtc27> and writing to GITS_BASER is UNPREDICTABLE when either GITS_CTLR.Enabled is 1 or GITS_CTLR.Quiescent is 0
<@jrtc27> so without a proper root cause analysis I don't want to be risking papering over legitimate bugs in our driver
<@jrtc27> (technically even if we don't clear Enabled I guess we should poll just in case it's not yet quiesced from something before, but that's probably rather unlikely)

I would be interested to know if this is the case here.

First: I didn't propose the first chunk, Andrew did, and I adopted it to speed things up.

Second: Since this platform refuses to write what has hitherto been our default cache bits, which proves beyond dispute that the cache bits cannot be freely written to - no matter what the specification says or who reads it. It follows that the second chunk is prudent, as there absolutely no point in preventing a new platform from booting, just because something is unclear to us in some specification.

Third: This saga started on the 9th of december last year in D48819.

For these reasons I am going to note your reservations, to which we can return if they ever turn into any kind of trouble, but I am not going to sit here twiddling my thumbs, waiting for you to find conclusive answers to questions which depend on how random Verilog-hackers interpret a third party specification differently from you.

Your unwillingness to try and actually understand the problem is your own choice, not mine. Adding a platform-specific quirk to ignore the inability to set something that the specification does not give implementations license to restrict is one thing, but doing it unilaterally for every platform is quite another thing and I wholeheartedly object to just disabling checks that get in the way with our current implementation on your hardware. We could quite easily have a real bug in our driver here, and in doing so we're just ignoring that. I also have absolutely no desire to encounter cache coherency bugs in future because of a change like this which just decides that cache coherency is something that can be ignored and everything will be fine, don't worry about it. I do not know what the implications of picking up a different firmware-chosen value here is and whether FreeBSD can in fact cope with it. Do you?

What you could have instead done is try something like the following patch, as I hinted at on IRC two weeks ago:

diff --git a/sys/arm64/arm64/gic_v3_reg.h b/sys/arm64/arm64/gic_v3_reg.h
index 792b532196a9..bba97fca914e 100644
--- a/sys/arm64/arm64/gic_v3_reg.h
+++ b/sys/arm64/arm64/gic_v3_reg.h
@@ -286,7 +286,8 @@
 #define	GITS_PIDR2_ARCH_GICv4	GICR_PIDR2_ARCH_GICv4
 
 #define	GITS_CTLR		(0x0000)
-#define		GITS_CTLR_EN	(1 << 0)
+#define		GITS_CTLR_EN		(1 << 0)
+#define		GITS_CTLR_QUIESCENT	(1U << 31)
 
 #define	GITS_IIDR		(0x0004)
 #define	 GITS_IIDR_PRODUCT_SHIFT	24
diff --git a/sys/arm64/arm64/gicv3_its.c b/sys/arm64/arm64/gicv3_its.c
index 4b554f2dc30a..f11e97a68a20 100644
--- a/sys/arm64/arm64/gicv3_its.c
+++ b/sys/arm64/arm64/gicv3_its.c
@@ -66,6 +66,7 @@
 #include <arm/arm/gic_common.h>
 #include <arm64/arm64/gic_v3_reg.h>
 #include <arm64/arm64/gic_v3_var.h>
+#include "gic_v3_reg.h"
 
 #ifdef FDT
 #include <dev/ofw/openfirm.h>
@@ -1061,6 +1062,8 @@ gicv3_its_attach(device_t dev)
 		ctlr &= ~GITS_CTLR_EN;
 		gic_its_write_4(sc, GITS_CTLR, ctlr);
 	}
+	while (((ctlr = gic_its_read_4(sc, GITS_CTLR)) & GITS_CTLR_QUIESCENT) == 0)
+		;
 
 	/* Allocate the private tables */
 	err = gicv3_its_table_init(dev, sc);

That may not be the problem, but it is *a* flaw in the driver I found by reading the spec. We would be in a much better place had less time been spent by trying to railroad hacky workarounds into the tree and more time spent on actually investigating the problem to understand what exactly is going on. There is a good chance it is more complicated than "this field cannot be changed to this value". I'm not saying it's not that, but there is nowhere near enough evidence to conclusively state that.

I also note, once again, that Linux does *not* ignore failure to set the cache coherency, and they, like us, probably have good reason for that.

TL;DR I am annoyed at your desire to insert hacks into a core driver for all modern arm64 platforms and do not want to see good engineering practice thrown out the window in the name of making a specific piece of hardware "work" in the quickest way possible

(To be clear, the hack is ignoring the error; using RaWaWb is fine, it's supposed to improve performance anyway, so that can be committed, even if ideally we'd also understand why it only works when we use that configuration on this platform)

I do not know what the implications of picking up a different firmware-chosen value here is and whether FreeBSD can in fact cope with it. Do you?

Yes, as it happens I do know: This machine does not needlessly panic.

Feel free to open a review on the unrelated patch you mention above.

Who knows, I might even be able to test it this month or something... :-)

Your unwillingness to try and actually understand the problem is your own choice, not mine. Adding a platform-specific quirk to ignore the inability to set something that the specification does not give implementations license to restrict is one thing, but doing it unilaterally for every platform is quite another thing and I wholeheartedly object to just disabling checks that get in the way with our current implementation on your hardware. We could quite easily have a real bug in our driver here, and in doing so we're just ignoring that. I also have absolutely no desire to encounter cache coherency bugs in future because of a change like this which just decides that cache coherency is something that can be ignored and everything will be fine, don't worry about it. I do not know what the implications of picking up a different firmware-chosen value here is and whether FreeBSD can in fact cope with it. Do you?

What you could have instead done is try something like the following patch, as I hinted at on IRC two weeks ago:

diff --git a/sys/arm64/arm64/gic_v3_reg.h b/sys/arm64/arm64/gic_v3_reg.h
index 792b532196a9..bba97fca914e 100644
--- a/sys/arm64/arm64/gic_v3_reg.h
+++ b/sys/arm64/arm64/gic_v3_reg.h
@@ -286,7 +286,8 @@
 #define	GITS_PIDR2_ARCH_GICv4	GICR_PIDR2_ARCH_GICv4
 
 #define	GITS_CTLR		(0x0000)
-#define		GITS_CTLR_EN	(1 << 0)
+#define		GITS_CTLR_EN		(1 << 0)
+#define		GITS_CTLR_QUIESCENT	(1U << 31)
 
 #define	GITS_IIDR		(0x0004)
 #define	 GITS_IIDR_PRODUCT_SHIFT	24
diff --git a/sys/arm64/arm64/gicv3_its.c b/sys/arm64/arm64/gicv3_its.c
index 4b554f2dc30a..f11e97a68a20 100644
--- a/sys/arm64/arm64/gicv3_its.c
+++ b/sys/arm64/arm64/gicv3_its.c
@@ -66,6 +66,7 @@
 #include <arm/arm/gic_common.h>
 #include <arm64/arm64/gic_v3_reg.h>
 #include <arm64/arm64/gic_v3_var.h>
+#include "gic_v3_reg.h"
 
 #ifdef FDT
 #include <dev/ofw/openfirm.h>
@@ -1061,6 +1062,8 @@ gicv3_its_attach(device_t dev)
 		ctlr &= ~GITS_CTLR_EN;
 		gic_its_write_4(sc, GITS_CTLR, ctlr);
 	}
+	while (((ctlr = gic_its_read_4(sc, GITS_CTLR)) & GITS_CTLR_QUIESCENT) == 0)
+		;
 
 	/* Allocate the private tables */
 	err = gicv3_its_table_init(dev, sc);

That may not be the problem, but it is *a* flaw in the driver I found by reading the spec. We would be in a much better place had less time been spent by trying to railroad hacky workarounds into the tree and more time spent on actually investigating the problem to understand what exactly is going on. There is a good chance it is more complicated than "this field cannot be changed to this value". I'm not saying it's not that, but there is nowhere near enough evidence to conclusively state that.

I also note, once again, that Linux does *not* ignore failure to set the cache coherency, and they, like us, probably have good reason for that.

We should land this bit of code too Jessica. I noticed it last night when going back over the ITS spec. I'm glad you also caught it. :-)