In D31014#697898, @andrew wrote:

Is it possible for the hardware to change the AF or DBM flags between loading the pte and the fcmpset in the arm64 pmap?

Jul 4 2021, 10:32 PM

Jul 3 2021

alc added a comment to D31014: pmap: On a failed fcmpset don't repeat tests that won't change.

On arm64, the new pmap code is smaller by 60 bytes.

Jul 3 2021, 11:50 PM

alc requested review of D31014: pmap: On a failed fcmpset don't repeat tests that won't change.

Jul 3 2021, 11:48 PM

Jun 30 2021

alc committed rG1a8bcf30f97e: amd64: a simplication to pmap_remove_{all,write} (authored by alc).

amd64: a simplication to pmap_remove_{all,write}

Jun 30 2021, 6:16 PM

alc closed D30951: amd64: a simplication to pmap_remove_{all,write}.

Jun 30 2021, 6:15 PM

alc added inline comments to D30951: amd64: a simplication to pmap_remove_{all,write}.

Jun 30 2021, 6:11 PM

alc requested review of D30951: amd64: a simplication to pmap_remove_{all,write}.

Jun 30 2021, 5:58 AM

Jun 29 2021

alc committed rG26a357245f21: arm64: a few simplications to pmap_remove_{all,write} (authored by alc).

arm64: a few simplications to pmap_remove_{all,write}

Jun 29 2021, 3:23 AM

alc closed D30931: arm64: a few simplications to pmap_remove_{all,write}.

Jun 29 2021, 3:23 AM

Jun 28 2021

alc added a comment to D30931: arm64: a few simplications to pmap_remove_{all,write}.

A curious difference between these functions is that pmap_remove_all() clears PGA_WRITEABLE before releasing the PV list lock but pmap_remove_write() does so after. I don't remember any reason for this difference.

Jun 28 2021, 8:20 PM

alc added a comment to D30931: arm64: a few simplications to pmap_remove_{all,write}.

In D30931#695853, @kib wrote:

amd64 code is identical, do you plan to provide the same patch for it?

Jun 28 2021, 8:15 PM

alc requested review of D30931: arm64: a few simplications to pmap_remove_{all,write}.

Jun 28 2021, 6:35 PM

alc added a comment to D30845: Allow us to mark a pmap as dead.

A couple thoughts:

I'm curious to know if the nested page table pmap is active on the current processor when bhyve is destroying the guest physical address space. If so, it ought to be calling pmap_remove_pages(), whether it is running on arm64 or any other architecture. That said, pmap_remove_pages() won't help if the guest physical address space is wired memory to allow direct access to, e.g., SR-IOV devices.
I wonder if adding a current activation count to the pmap and when that count is zero resetting the pmap's ASID rather than performing TLB invalidations wouldn't be a better, more general solution if pmap_remove_pages() can't be used.

Jun 28 2021, 6:57 AM

Jun 27 2021

alc committed rG19c288b3a664: arm64: eliminate a duplicated #define (authored by alc).

arm64: eliminate a duplicated #define

Jun 27 2021, 6:47 AM

Jun 26 2021

alc committed rG5dd84e315a9f: arm64: fix a potential KVA leak in pmap_demote_l1() (authored by alc).

arm64: fix a potential KVA leak in pmap_demote_l1()

Jun 26 2021, 4:04 AM

Jun 24 2021

alc committed rGc94249decd16: arm64: make it possible to define PV_STATS (authored by alc).

arm64: make it possible to define PV_STATS

Jun 24 2021, 11:34 PM

alc added a comment to D30845: Allow us to mark a pmap as dead.

In D30845#694140, @kib wrote:

In D30845#694134, @andrew wrote:

You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de

Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.

I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.

Jun 24 2021, 6:24 PM

alc closed D30876: arm64: replace pa_to_pvh() with page_to_pvh() in pmap_remove_l2().

Jun 24 2021, 3:38 AM

alc committed rG0c188c06c627: arm64: replace pa_to_pvh() with page_to_pvh() in pmap_remove_l2() (authored by alc).

arm64: replace pa_to_pvh() with page_to_pvh() in pmap_remove_l2()

Jun 24 2021, 3:38 AM

alc added inline comments to D30876: arm64: replace pa_to_pvh() with page_to_pvh() in pmap_remove_l2().

Jun 24 2021, 3:26 AM

Jun 23 2021

alc closed D30875: arm64: remove an unneeded test from pmap_clear_modify().

Jun 23 2021, 7:29 PM

alc committed rG62ea198e95f1: arm64: remove an unneeded test from pmap_clear_modify() (authored by alc).

arm64: remove an unneeded test from pmap_clear_modify()

Jun 23 2021, 7:29 PM

alc requested review of D30876: arm64: replace pa_to_pvh() with page_to_pvh() in pmap_remove_l2().

Jun 23 2021, 8:01 AM

alc requested review of D30875: arm64: remove an unneeded test from pmap_clear_modify().

Jun 23 2021, 5:33 AM

Jun 22 2021

alc added a comment to D30845: Allow us to mark a pmap as dead.

In D30845#694140, @kib wrote:

In D30845#694134, @andrew wrote:

You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de

Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.

I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.

For the bhyve patch above, why is it useful? As I understand, this vmspace/pmap is never going to be activated at all, so why claiming that TLB invalidation not needed helps? Also, for ARMv8, SMP TLB invalidations do not require IPI, this is why I was surprised that such optimization is ever helpful.

Jun 22 2021, 6:52 PM

Jun 21 2021

alc closed D30832: arm64: Use page_to_pvh() instead of pa_to_pvh() when the vm_page_t is known.

Jun 21 2021, 10:28 PM

alc committed rG6f6a166eaf5e: arm64: Use page_to_pvh() when the vm_page_t is known (authored by alc).

arm64: Use page_to_pvh() when the vm_page_t is known

Jun 21 2021, 10:28 PM

Jun 20 2021

alc requested review of D30832: arm64: Use page_to_pvh() instead of pa_to_pvh() when the vm_page_t is known.

Jun 20 2021, 6:33 PM

Jun 8 2021

alc added inline comments to D30442: mprotect.2: Update text for largepages.

Jun 8 2021, 11:07 PM

Jun 6 2021

alc added a comment to D30644: riscv: Handle hardware-managed dirty bit updates in pmap_promote_l2().

In D30644#688473, @jrtc27 wrote:

Wes, could you please check this does indeed fix the panic you were seeing with snmalloc, not just your hand-written test?

Jun 6 2021, 7:15 PM

alc added a comment to D30643: arm64: Fix pmap_copy()'s handling of 2MB mappings.

In D30643#688425, @kib wrote:

Don't we mark the copied mapping as clean to avoid unnecessary writes? Suppose that the source mapping is destroyed, and backing pages are marked dirty and written to the storage. Now, if the copied mapping is destroyed without ever being written to, we would re-dirty and write them again.

Jun 6 2021, 7:02 PM

alc accepted D30643: arm64: Fix pmap_copy()'s handling of 2MB mappings.

Jun 6 2021, 6:20 PM

alc accepted D30642: arm64: Use the right PTE when downgrading perms in pmap_promote_l2().

Jun 6 2021, 6:15 PM

alc accepted D30644: riscv: Handle hardware-managed dirty bit updates in pmap_promote_l2().

Jun 6 2021, 7:57 AM

Jun 5 2021

alc added a comment to D30643: arm64: Fix pmap_copy()'s handling of 2MB mappings.

You wrote, "Modify pmap_copy() to make new 2MB mappings read-only, like we do on amd64. I am not sure though why we shouldn't simply copy the dirty bit over to the child."

Jun 5 2021, 7:09 PM

alc added a comment to D30642: arm64: Use the right PTE when downgrading perms in pmap_promote_l2().

As an aside, I would probably replicate the amd64 comments that we do not need to perform a TLB invalidation in this case.

Jun 5 2021, 6:51 PM

alc accepted D30642: arm64: Use the right PTE when downgrading perms in pmap_promote_l2().

Jun 5 2021, 6:46 PM

May 30 2021

alc added a comment to D30442: mprotect.2: Update text for largepages.

In D30442#683738, @kib wrote:

In D30442#683737, @jhb wrote:

Won't we demote large pages if the protections of sub-pages differ? I don't think we (yet) have a flag to make mappings use a minimum page size (e.g. to force the use of 2MB pages and not permit demoting to 4K pages) for which the effect would then be as Brooks' new sentence describes.

superpages != largepages. Largepages are relatively new thing, for them we guarantee that

backing object is populated with contiguous pages suitable for superpage mapping

userspace maping is always done with superpages PTEs

we do not allow to clip them not at superpage boundary

May 30 2021, 7:20 AM

alc accepted D30442: mprotect.2: Update text for largepages.

May 30 2021, 6:41 AM

Mar 25 2021

alc added inline comments to D29417: amd64: Implement a KASAN shadow map.

Mar 25 2021, 5:28 AM

Mar 16 2021

alc added a comment to D28805: vm: Handle VM_ALLOC_ZERO in the page allocator.

In D28805#646315, @markj wrote:

In D28805#645337, @alc wrote:

I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist().
Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.

So to be clear, your proposal is to add a vm_page_alloc_anon() (or _noobj()?) which accepts a VM_ALLOC_ZERO flag and implements it by zeroing the page, whereas vm_page_alloc(_contig)() should stop taking VM_ALLOC_ZERO and instead return PG_ZERO unchanged?

Yes, and vm_page_alloc_noobj() makes sense to me. (I think that vm_page_alloc_anon() could be too easily confused with allocation of pages to OBJ_ANON vm objects. In other words, I would avoid using any derivatives of the word anonymous.)

I like the idea of splitting the allocator functions and preserving PG_ZERO. It feels a bit odd to have inconsistent handling with respect to VM_ALLOC_ZERO though. There are situations where allocations are rare enough that zeroing under the object lock is not a problem (or it is required), and splitting the allocator entry points would make it easier to spot calls where it is likely to be a problem. The pti_obj object used to manage userspace-visible "holes" in the kernel address space is an example of this.

I'm not going to argue strenuously for leaving VM_ALLOC_ZERO support out of vm_page_alloc{,_contig}(). I agree that splitting the allocator entry points will make it easier to spot calls that shouldn't use VM_ALLOC_ZERO.

Mar 16 2021, 1:35 AM

Mar 11 2021

alc accepted D29203: vm_reserv: Fix list locking in vm_reserv_reclaim_contig().

Mar 11 2021, 4:28 AM

alc added inline comments to D29203: vm_reserv: Fix list locking in vm_reserv_reclaim_contig().

Mar 11 2021, 4:02 AM

alc accepted D29203: vm_reserv: Fix list locking in vm_reserv_reclaim_contig().

Mar 11 2021, 12:15 AM

Mar 2 2021

alc added a comment to D28924: vm: Round up npages and alignment for contig reclamation.

In D28924#649287, @markj wrote:

In D28924#649276, @mav wrote:

In D28924#649265, @markj wrote:

it takes about 8s to allocate 100 clusters on a system with 64GB, vs. 2-2.5s with the patch applied.

It is good to hear, but still does not sound realistic for networking purposes. Plus my systems often have 256GB or more memory. Have you tried it together with your origial optimization patch?

Right, this not expected to be a full solution to the problem. I will look more at preferentially reclaiming from the phys_segs corresponding to the default freelists, and ending the scan earlier.

I am wondering if the intent behind the current implementation is to provide a consistent runtime for reclamation. Suppose we started scanning from the beginning of physical memory and over time reclaimed more and more runs. Subsequent scans will take longer and longer since they always start from the same place. Perhaps we could maintain some cursor that gets updated after a scan and is used to mark the beginning of subsequent scans.

Mar 2 2021, 8:23 PM

Feb 28 2021

alc added a comment to D28924: vm: Round up npages and alignment for contig reclamation.

mav@ could you please give your hack/test case to markj@. There should be a significant reduction in the amount scanning with this patch.

Feb 28 2021, 9:52 PM

alc accepted D28924: vm: Round up npages and alignment for contig reclamation.

Feb 28 2021, 9:48 PM

alc added inline comments to D28924: vm: Round up npages and alignment for contig reclamation.

Feb 28 2021, 9:08 PM

Feb 25 2021

alc added a comment to D28924: vm: Round up npages and alignment for contig reclamation.

The rounding up should be capped at the largest supported buddy list order. (For allocation requests that are larger than the largest supported order, we do the following: For each block in the largest order list, we look at its successors in the vm_page array to see if a sufficient number of them are free to satisfy the request.)

Feb 25 2021, 7:17 PM

Feb 20 2021

alc added a comment to D28805: vm: Handle VM_ALLOC_ZERO in the page allocator.

I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist(). Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.

Feb 20 2021, 11:05 PM

alc added inline comments to D28810: Let the VM page allocator handle page zeroing.

Feb 20 2021, 10:48 PM

alc added a comment to D28807: vm fault: Adapt to new VM_ALLOC_ZERO semantics.

In D28807#644991, @markj wrote:

In D28807#644985, @kib wrote:

Still, you might add a vm_page_alloc() flag that would ask to not clear PG_ZERO on return, making it the duty of the caller. Then vm_fault() could utilize it to preserve the optimization. Could it be useful for 32bit machines?

I considered it and was looking for i386 systems in the cluster so I can check the v_zfod and v_ozfod counter values. I couldn't find any though, so I will look at a VM soon and try some simple loads to see if it is worth preserving. My suspicion is that it is still a minor optimization even on 32-bit systems since we are relying on the pmap to provide pre-zeroed pages, and it will not provide very many relative to typical application usage. Pages allocated from superpage reservations are unlikely to be pre-zeroed. Finally, in principle the page will be warm in the data caches if it is zeroed on demand, while with a pre-zeroed page this is less likely.

Feb 20 2021, 8:19 PM

alc added inline comments to D28805: vm: Handle VM_ALLOC_ZERO in the page allocator.

Feb 20 2021, 7:12 PM

alc added inline comments to D28806: x86/iommu: Update following VM_ALLOC_ZERO semantic change.

Feb 20 2021, 7:01 PM

Feb 9 2021

alc accepted D28555: vm: Honour the "noreuse" flag to vm_page_unwire_managed().

Feb 9 2021, 5:40 PM

Jan 21 2021

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 21 2021, 7:13 PM

alc accepted D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 21 2021, 7:08 PM

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 21 2021, 7:45 AM

Jan 20 2021

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 20 2021, 8:10 AM

alc accepted D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 20 2021, 2:54 AM

Jan 19 2021

alc accepted D28225: Set VM_KMEM_SIZE_SCALE to 1 on riscv and arm64.

Jan 19 2021, 6:54 AM

Jan 18 2021

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 18 2021, 11:10 PM

alc accepted D28225: Set VM_KMEM_SIZE_SCALE to 1 on riscv and arm64.

Similarly, there is no reason to define VM_KMEM_SIZE_MIN.

Jan 18 2021, 11:00 PM

alc accepted D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..

Jan 18 2021, 10:26 PM

Jan 15 2021

alc added a comment to D27956: amd64 pmap: do not sleep in _pmap_allocpte() with zero referenced page table page..

In D27956#627791, @kib wrote:

In D27956#627736, @markj wrote:

Do riscv or arm64 have the same bug?

No, this bug was introduced with LA57 rewrite of pmap_allocpte(). riscv and arm64 forked pmap.c before LA57.