- User Since
- Feb 3 2015, 4:54 AM (324 w, 3 d)
Mar 19 2021
Mar 18 2021
Mar 7 2021
Mar 4 2021
Mar 3 2021
I don't think this change is necessary (it mitigates another bug). So I prefer to unconditionally initialize both registers in attach.
write is expected (moreover these are WO ->thus not suitable for SYSCON_MODIFY operation).
I have initial part of this fixup. It should allow board to boot, at minimum. Unfortunately , the interrupt handling have still locking problem -> the solution needs change in syscon provided by simple_mfd driver -> so i need more time for this (one or two days).
But it would be nice if you can test this in your environment. It should fix hang-up issue in pcie driver.
What an ignominy...
Mar 2 2021
Noooo, Im stupid :( The root of the problem is trivial -> gpio_write() doesn't modify only the required bit, but the whole 32-bit register. HW init should use new gpio_modify() (implemented with SYSCON_MODIFY_4() not SYSCON_WRITE_4()). Big big sorry for troubles. I'll fix it tomorrow.
Oops. Thanks for fixing my bugs ...
Mar 1 2021
Thanks, perfect. Please let me know if you need help. I think that there will be more and more similar interrupt controllers/systems, so it's important to create a clean and flexible interface.
I'm sorry but I don't like this approach.
The PIC_MAP_INT function is designed to map external data of exact source to given irqsrc. It should not be misused to transfer any additional, platform specific data back to requester. Moreover if these external data (pure synthetic, without any relation to FDT) are passed as INTR_MAP_DATA_FDT to mapping layer.
Imho, we should convert gicp to standard MSI based interrupt controller. The communication between ICU and GICP should be splitted to normal MSI request and standard method for getting MSI mapping should be used (msi_map_msi()).
It's a little hard for me to express all the nuances so I can prepare skeleton for this solution, if you want.
With above objection.
Can you please try the test again with https://cgit.FreeBSD.org/src/commit/?id=ce5a4083de2d79bc44d209c9e355a09ede47346c ? I hope that it fixed also this problem. Thanks.
My original idea was to do as much as possible for armv7. Primarily because it has been working on this platform for a long time. My bad is that I didn't remember that flush-to-zero mode was chosen because the little version of armv7 VFP may need software emulation for rounding to denormal values, so we chose the IEEE 754 incompatible mode.
Feb 18 2021
Feb 16 2021
Ahh, right, I forgot that Neoverse has (from this point of view) cache levels shifted – see slide 5 of https://www.slideshare.net/linaroorg/getting-the-most-out-of-dynamiq-enabling-support-of-dynamiq-sfo17104
So for purpose of OS optimization we can take real L1 + L2 caches as L1 in pre-neoverse meaning, real l3 in DynamIQ Shared Unit (DSU) block as L2 in pre-neoverse, and CMN as L3.
I think we can still handle all the cases using a two-level hierarchy, where NUMA domains as CG_SHARE_L3 groups and clusters as CG_SHARE_L2 groups will be exported. It should work on a server system, on a big.LITTLE (RK3399) and also on a medium SoC (LX2160A, which have 8 dual-core clusters). Do you think so?
I'm not sure if you want to implement this "extension", and I don't want to block you. The code in this review looks fine to me and doesn't block anything, so push it as needed.
Feb 15 2021
I've spent some time digging up the ARM documentation, but unfortunately we don't seem to be able to determine the exact cache topology. But I think we can estimate it with a reasonable degree of accuracy. For rest, I assume that bit  bits is set (otherwise the affinity fields are shifted).
Feb 13 2021
I agree with Andrew. We should use mpidr to build a cores topology. Nowadays it's easy, mpidr is stored in pcpu for all enumerated cores.
But there is another problem - cpuid should be taken as arbitrarily chosen value without any connection to cores topology - nobody can guarantee that cores are numbered sequentially within NUMA domain. Also assumption that NUMA domains are always symmetric (have same number of cores) looks too optimistic. Ampere, as well as LX2160, uses multicluster of dual cores with per cluster l2 cache - I think that L2 cache locality should also be included in initial implementation.
By that I mean that I offer help with implementation and testing on FDT based systems (unfortunately, ACPI is out of my scope and also setup).
Feb 11 2021
Feb 9 2021
Feb 3 2021
Jan 27 2021
Jan 21 2021
Kib, this error handling doesn't make sense to me, so I had to miss something important. Can you, please, explain me the context/reason for this strange kind of errors hiding? Moreover EBIG is also allowed in man page.
Jan 19 2021
Problem is that you cannot expect that (and moreover you cannot determine if) kernel can access memory describes by FDT reservation.
FDT reservation node is used for various purposes. It may be used as an advisory (i.e. for dma buffers location), as a shared memory (i.e. for framebuffer) or as hard exclusion area (i.e. for memory used in secure world) – thus inaccessible from kernel. Moreover, in last case, the implementation is allowed to generate (in rare cases) imprecise exception as result of attempt to access this protected (by trustzone hardware) memory
The FDT reservations are typically added dynamically by u-Boot or ATF -> see fdt_add_reserved_memory(). There is some chance that you can use “no-map” attribute to determine if the reservation may be accessible by kernel. For other cases you must be sure that memory is not mapped by kernel with different attributes than in u-boot or firmware -> this is architecturally undefined behavior which leads sooner or later to loss of coherency.
By all this, you cannot blindly take reserved memory as accessible by kernel and exportable by /dev/mem to userspace.
I can only recommend you try the opposite approach – explicitly map ACPI tables to kernel. I assume, that you can determine memory range where are these tables are and also than you can determine memory attributes if these are mapped by other party (ACPI).
Jan 15 2021
It is a bit questionable whether ctfconvert should generate an error in this case (I don't think so), but calling it is clearly unnecessary in this case.
Jan 14 2021
I afraid that this approach have problem on FDT based systems. FDT typically uses reserved memory regions also for secure portion of base memory (memory used by secure monitor or PSCI). And reserved memory is handled by using EXFLAG_NOALLOC -> https://cgit.freebsd.org/src/tree/sys/arm64/arm64/machdep.c#n1193
And of course this kind of memory cannot be accessed by /dev/mem.
Jan 9 2021
Unfortunately, side effect of this is large reduction of VA space available for mmap -> available range for (unfixed) mmap is only in the interval <start of data segment + MAXDSIZ, end of user VA space>.
Jan 2 2021
Ohh right, my next mistake. Seems I have not lucky day today :(
Thanks for fixing my stupid bugs.
I just committed fix for arm.
Thanks for cooperation.
Yes, it is also broken. ARM uses its own implementation, which does not do modulo for the bit position.
I will commit proper fix in next hour or so...
Hmm, right. It would be better to read the manual before coding. My bad.
Good catch. It never occurred to me that we could have a bug in atomics on two architectures at the same time, so I wasn't looking for a problem in my own garden :) I will try to fix arm by myself.
Tested on real HW, everything OK.
Dec 30 2020
Dec 28 2020
Dec 27 2020
This is not entirely true, for every SoC we know where DRAM is located and its maximum size. Same is true for all MMIO peripherals.
Tested with LINUX_BOOT_ABI, everything works fine.
But why did you also change the behavior for SOCDEV_PA/VA? Removal of the option to choose a SOCDEV_PA looks a like step backward to me. This breaks existing code (but covered by #if 0) for example in uart_dev_snps.c. At least this change should be mentioned in commit message, but I think that old way is better.
Dec 26 2020
Can you, please, rebase this to fresh tree?
Patch fails for me with:
git apply -p0 ../../git/D27765.diff error: patch failed: sys/arm64/arm64/locore.S:497 error: sys/arm64/arm64/locore.S: patch does not app
Dec 25 2020
Dec 17 2020
Dec 16 2020
Dec 14 2020
Dec 7 2020
Unfortunately, this fails on armv7 and on aarch64.
Works for me and it's much better than my initial proposal.
Dec 5 2020
Dec 4 2020
Dec 3 2020
I want to be stronger :) - to the best of my knowledge, wmb ()) is not needed at all or bus_dmamap_sync() is broken for x86. Nothing in between.
In current revision, the memory barrier should be ensured by calling bus_dmamap_sync() - (it's an external function and the buffer is not local variable, so compiler must expect access to these data).
I think that x86 architecture ensures right store ordering also for external observers so I think usage of wmb() (or any other explicit synchronization) in any driver is nothing but bug.
Dec 2 2020
Bah, Warner was much faster :)
This wmb() was exist in initial version of nvme drivers, which did not used standard FreeBSD busdma functions.
See https://svnweb.freebsd.org/base/head/sys/dev/nvme/nvme_qpair.c?view=markup&pathrev=240616#l415 .
I'm near sure it was used as barrier to ensure the visibility of memcpy() few line above to nvme dma before the real comand is fired (by nvme_mmio_write_4()).
Dec 1 2020