I've tested this latest iteration as well.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 14 2021
Mar 5 2021
Dec 8 2020
Oct 13 2020
Jun 23 2020
Jun 5 2019
Jun 3 2019
May 17 2019
May 16 2019
Swapped aarch64 for arm64 per emaste's suggestion and addressed rlibby's nits.
May 15 2019
In D20097#436851, @zeising wrote:What is remaining in this review?
May 13 2019
Add support for arm64 plus rework _bus_dmamap_pagesneeded to short-circuit if we don't need an exact count but just need to know if *any* are needed.
May 7 2019
In D20181#434840, @zeising wrote:I'll give this a spin and look for regressions in graphics land, but it might take a day or two. Can you add x11 as group reviewer?
May 6 2019
Joining a few threads on this back together, below is a fleshed out version (minus arm64) of what I am thinking. It removes some penalty (the additional LOOKUP) when dmar is enabled and significantly reduces overhead in the bounce-without-bounce case.
May 5 2019
Update to a diff with full context.
May 4 2019
In D20154#434058, @kib wrote:I think that 4G value for BUS_SPACE_MAXSIZE still chomped the PCIe max DMA transfers into 4G chunks.
May 3 2019
In D20097#433904, @zeising wrote:This is still running fine from a graphics perspective.
DRM has been broken for some time now, what's needed to get this in?
May 2 2019
In D20097#433503, @kib wrote:
Why do you suggest that length check is not needed ? Putting the discussion of a possible race aside, why the lengths of two loads must be the same ?
In D20097#433442, @kib wrote:In D20097#433411, @tychon wrote:I don't see how multiple mappings aren't a bug. The linux API doesn't do any ref-counting. There the first unmap would wipe out both. If there is sharing of an underlying resource that needs to be coordinated at a higher level; this isn't the place. Tolerating it to provide some cover is one thing and I'm not sure I even agree that is the right approach. IMHO fixing the driver is.
The double-mappings should not be bugs same as they are fine for our native busdma KPI. Consider:
- for bounce, without bounce, bus address == phys address and nothing happens both on map and unmap
- for bounce, with bounce, second map allocates another set of bounce pages, so bus address is different
- for DMAR, guest mapping is created anew for each map request, again it is fine.
One of my points is that physical address is user-controllable, in some situations, so the KPI must handle duplicates.
In D20097#433274, @kib wrote:In D20097#433273, @hselasky wrote:@kib: I was thinking it would be better in a followup commit to add a debug knob to print the backtrace when double mappings happen or when the unmap cannot find the DMA address, and trace this down in the ibcore code. It is apparently a bug. Right now DRM-next and IBCORE works with this patch, and I think the current behaviour to kill old mappings on the same address is fine.
I disagree completely. Suppose that two RDMA clients mmap the same shared memory, and then use the same buffer for transfers.
May 1 2019
Would it be prudent to decouple #2 and #3? The fix for #1 raises some interesting implementation questions and may ultimately be fixed better in the driver anyway.
Apr 30 2019
Ideally #2 and #3 would be discrete commits but I see the value in single place to point folks to.
Apr 29 2019
I really like what you did with the locking. Thanks for pitching in here!!
Apr 25 2019
Apr 24 2019
Apr 19 2019
Apr 18 2019
In D19845#428768, @greg_unrelenting.technology wrote:Some more i915 GPU testing (w/o the latest update here): after using Firefox (opengl layers, xwayland) for some time, GPU resets start happening
drmn0: Resetting chip for stuck wait on rcs0 drmn0: Resetting chip for stuck wait on rcs0 drmn0: Resetting chip for stuck wait on rcs0 … DMAR0: Fault Overflow DMAR0: vgapci0: pci:0:2:0 sid 10 fault acc 0 adt 0x0 reason 0x5 addr 2e09000 DMAR0: Fault Overflow DMAR0: vgapci0: pci:0:2:0 sid 10 fault acc 0 adt 0x0 reason 0x5 addr 2e09000and eventually the whole system freezes if I don't quit the compositor / switch to vt console.
Apr 17 2019
Bump __FreeBSD_version and serialize required bus_dma(9) calls.
Apr 16 2019
Fix most trivial of trivial whitespace issues. I just want to avoid the tool chain complaining about any divergences so I'm updating the diff.
Apr 14 2019
In D19845#427478, @greg_unrelenting.technology wrote:Also tested on an AMD Ryzen + Vega system, no regressions. (No IOMMU there because no one wrote a dmar equivalent for AMD IOMMU…)
btw, amdgpu touches dma_mask in one place, had to do this to fix build:
--- i/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ w/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -366,7 +366,7 @@ void amdgpu_amdkfd_get_local_mem_info(struct kgd_dev *kgd, struct kfd_local_mem_info *mem_info) { struct amdgpu_device *adev = (struct amdgpu_device *)kgd; - uint64_t address_mask = adev->dev->dma_mask ? ~*adev->dev->dma_mask : + uint64_t address_mask = adev->dev->dma_priv ? ~*((uint64_t*)adev->dev->dma_priv) : ~((1ULL << 32) - 1); resource_size_t aper_limit = adev->gmc.aper_base + adev->gmc.aper_size;
In D19845#427477, @greg_unrelenting.technology wrote:Tested on my Haswell laptop with drm-v5.0, everything works (both with DMAR on and off), this line is new in dmesg:
vgapci0: dmar0 pci0:0:2:0 rid 10 domain 0 mgaw 48 agaw 48 re-mappedso it seems like the GPU is IOMMU'd. (full dmesg)
Apr 12 2019
Incorporate a few more review comments: add missing BUS_DMA_NOWAIT flags to _bus_dmamap_load_phys() and optimize linux_dma_map_sg_attrs() to coalesce physically contiguous scatter list entries.
Might a well make this as good as can be. I combined the tests into one.
Apr 11 2019
Use markj@'s suggestion for a more overt/intuitive fix.
And just a heads-up these patches uncovered an issue with the cache-only zone destructor trying to destroy a non-existent keg. That's being worked in D19835.
Address further code review feedback: use non-sleepable allocs, fix weird formatting and add KASSERT(negs == 1).
Apr 10 2019
Addressed a few code review comments. There is a bit more work to do. Glancing at the feedback, I forgot 'assert that nseg == 1'. Plus I've got the optimization of gluing adjacent sg segments in the works too.
Get rid of PCI_DMA_BOUNDARY entirely.
Apr 9 2019
No locking is provided internally by the path-compressed radix trie implementation. It worked shockingly well without until it didn't. Adding locking to make the LinuxKPI DMA routines MT-Safe.
Apr 8 2019
While trying to explain the design I realized that the UMA "linux_dma objects" zone doesn't need to be per-device. I can make that global.
Apr 5 2019
Apr 2 2019
I fixed the issues in the previous version which I should have taken more time to review myself before posting :-(
In D19753#424464, @kib wrote:Hm, sorry for following up immediately.
Can you change the patch slightly, to use the result of PHYS_TO_VM_PAGE() if it is usable ? In other words, only fill the missed slots in ma[].
I think I've addressed at least many an attempt to address all outstanding feedback now. The diff is updated.
Add || TEST_DMA_8K_PB to pre-condition verification.
In D19780#424304, @cem wrote:arc-copy modes
typo: "crc-copy" (in the summary)
Looks mostly good to me. A few remarks:
This revision includes setups up the '48-bit' DMA address constraint for the 'data' tag and the '40-bit' DMA address constraint for the 'crc data' tag.
Apr 1 2019
In D19753#423602, @kib wrote:I do not like it. If we always pass ficitious pages, we do not need to pass pages at all, we can get away with physical address only. But I do want to have pages there for several reasons.
I do have the same problem on my bhyve integration branch, but there I do the following:
diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c index a90a6f805b7..1f9d266ba7e 100644 --- a/sys/vm/vm_page.c +++ b/sys/vm/vm_page.c @@ -557,7 +557,7 @@ vm_page_startup(vm_offset_t vaddr) #ifdef WITNESS int witness_size; #endif -#if defined(__i386__) && defined(VM_PHYSSEG_DENSE) +#if (defined(__i386__) || defined(__amd64__)) && defined(VM_PHYSSEG_DENSE) long ii; #endif @@ -800,7 +800,11 @@ vm_page_startup(vm_offset_t vaddr) * Initialize the page structures and add every available page to the * physical memory allocator's free lists. */ -#if defined(__i386__) && defined(VM_PHYSSEG_DENSE) +#if (defined(__i386__) || defined(__amd64__)) && defined(VM_PHYSSEG_DENSE) + /* + * i386 needs this for copyout(9) calling vm_fault_quick_hold_pages(). + * amd64 requires that for DMAR busdma and bhyve IOMMU. + */ for (ii = 0; ii < vm_page_array_size; ii++) { m = &vm_page_array[ii]; vm_page_init_page(m, (first_page + ii) << PAGE_SHIFT, 0);As a temporal solution, you might consider using fake pages only for addresses where PHYS_TO_VM_PAGE() failed.