- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 24 2021
Jul 15 2021
Jul 14 2021
Update comments.
Jul 13 2021
The demotion code does not require superpage mappings to have their accessed bit set, so there is no reason not to clear it in pmap_copy().
Jul 7 2021
Jul 6 2021
Jul 4 2021
In D31014#697898, @andrew wrote:Is it possible for the hardware to change the AF or DBM flags between loading the pte and the fcmpset in the arm64 pmap?
Jul 3 2021
On arm64, the new pmap code is smaller by 60 bytes.
Jun 30 2021
Jun 29 2021
Jun 28 2021
A curious difference between these functions is that pmap_remove_all() clears PGA_WRITEABLE before releasing the PV list lock but pmap_remove_write() does so after. I don't remember any reason for this difference.
In D30931#695853, @kib wrote:amd64 code is identical, do you plan to provide the same patch for it?
A couple thoughts:
- I'm curious to know if the nested page table pmap is active on the current processor when bhyve is destroying the guest physical address space. If so, it ought to be calling pmap_remove_pages(), whether it is running on arm64 or any other architecture. That said, pmap_remove_pages() won't help if the guest physical address space is wired memory to allow direct access to, e.g., SR-IOV devices.
- I wonder if adding a current activation count to the pmap and when that count is zero resetting the pmap's ASID rather than performing TLB invalidations wouldn't be a better, more general solution if pmap_remove_pages() can't be used.
Jun 27 2021
Jun 26 2021
Jun 24 2021
In D30845#694140, @kib wrote:In D30845#694134, @andrew wrote:You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de
Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.
I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.
Jun 23 2021
Jun 22 2021
In D30845#694140, @kib wrote:In D30845#694134, @andrew wrote:You can see the use in bhyve in https://github.com/CTSRD-CHERI/freebsd-morello/commit/d12af6f53e9c9ba08d80974be2fbce7fa9381f95#diff-1c77bb82662186164af0461f49f764096e3c3bd3d3f324a2db019d242cbdb3de
Is there anywhere in the vm code it would make sense to call pmap_pre_destroy? If so I can look at making it a MI interface.
I do not think the dead state in the proposed interpretation can be set earlier than in pmap_remove_pages(), but there it is already useless. We cannot set it earlier because pmap_remove_pages() is the first call into pmap on the path of the vmspace destruction. On the other hand, we do not destroy vmspace until only the current thread has this pmap active, and no new activations of this pmap can occur. So nothing should invalidate the pmap on remote CPUs.
For the bhyve patch above, why is it useful? As I understand, this vmspace/pmap is never going to be activated at all, so why claiming that TLB invalidation not needed helps? Also, for ARMv8, SMP TLB invalidations do not require IPI, this is why I was surprised that such optimization is ever helpful.
Jun 21 2021
Jun 20 2021
Jun 8 2021
Jun 6 2021
In D30644#688473, @jrtc27 wrote:Wes, could you please check this does indeed fix the panic you were seeing with snmalloc, not just your hand-written test?
In D30643#688425, @kib wrote:Don't we mark the copied mapping as clean to avoid unnecessary writes? Suppose that the source mapping is destroyed, and backing pages are marked dirty and written to the storage. Now, if the copied mapping is destroyed without ever being written to, we would re-dirty and write them again.
Jun 5 2021
You wrote, "Modify pmap_copy() to make new 2MB mappings read-only, like we do on amd64. I am not sure though why we shouldn't simply copy the dirty bit over to the child."
As an aside, I would probably replicate the amd64 comments that we do not need to perform a TLB invalidation in this case.
May 30 2021
In D30442#683738, @kib wrote:In D30442#683737, @jhb wrote:Won't we demote large pages if the protections of sub-pages differ? I don't think we (yet) have a flag to make mappings use a minimum page size (e.g. to force the use of 2MB pages and not permit demoting to 4K pages) for which the effect would then be as Brooks' new sentence describes.
superpages != largepages. Largepages are relatively new thing, for them we guarantee that
- backing object is populated with contiguous pages suitable for superpage mapping
- userspace maping is always done with superpages PTEs
- we do not allow to clip them not at superpage boundary
Mar 25 2021
Mar 16 2021
In D28805#646315, @markj wrote:In D28805#645337, @alc wrote:I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist().
Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.So to be clear, your proposal is to add a vm_page_alloc_anon() (or _noobj()?) which accepts a VM_ALLOC_ZERO flag and implements it by zeroing the page, whereas vm_page_alloc(_contig)() should stop taking VM_ALLOC_ZERO and instead return PG_ZERO unchanged?
Yes, and vm_page_alloc_noobj() makes sense to me. (I think that vm_page_alloc_anon() could be too easily confused with allocation of pages to OBJ_ANON vm objects. In other words, I would avoid using any derivatives of the word anonymous.)
I like the idea of splitting the allocator functions and preserving PG_ZERO. It feels a bit odd to have inconsistent handling with respect to VM_ALLOC_ZERO though. There are situations where allocations are rare enough that zeroing under the object lock is not a problem (or it is required), and splitting the allocator entry points would make it easier to spot calls where it is likely to be a problem. The pti_obj object used to manage userspace-visible "holes" in the kernel address space is an example of this.
I'm not going to argue strenuously for leaving VM_ALLOC_ZERO support out of vm_page_alloc{,_contig}(). I agree that splitting the allocator entry points will make it easier to spot calls that shouldn't use VM_ALLOC_ZERO.
Mar 11 2021
Mar 2 2021
In D28924#649287, @markj wrote:In D28924#649276, @mav wrote:In D28924#649265, @markj wrote:it takes about 8s to allocate 100 clusters on a system with 64GB, vs. 2-2.5s with the patch applied.
It is good to hear, but still does not sound realistic for networking purposes. Plus my systems often have 256GB or more memory. Have you tried it together with your origial optimization patch?
Right, this not expected to be a full solution to the problem. I will look more at preferentially reclaiming from the phys_segs corresponding to the default freelists, and ending the scan earlier.
I am wondering if the intent behind the current implementation is to provide a consistent runtime for reclamation. Suppose we started scanning from the beginning of physical memory and over time reclaimed more and more runs. Subsequent scans will take longer and longer since they always start from the same place. Perhaps we could maintain some cursor that gets updated after a scan and is used to mark the beginning of subsequent scans.
Feb 28 2021
mav@ could you please give your hack/test case to markj@. There should be a significant reduction in the amount scanning with this patch.
Feb 25 2021
The rounding up should be capped at the largest supported buddy list order. (For allocation requests that are larger than the largest supported order, we do the following: For each block in the largest order list, we look at its successors in the vm_page array to see if a sufficient number of them are free to satisfy the request.)
Feb 20 2021
I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist(). Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.
In D28807#644991, @markj wrote:In D28807#644985, @kib wrote:Still, you might add a vm_page_alloc() flag that would ask to not clear PG_ZERO on return, making it the duty of the caller. Then vm_fault() could utilize it to preserve the optimization. Could it be useful for 32bit machines?
I considered it and was looking for i386 systems in the cluster so I can check the v_zfod and v_ozfod counter values. I couldn't find any though, so I will look at a VM soon and try some simple loads to see if it is worth preserving. My suspicion is that it is still a minor optimization even on 32-bit systems since we are relying on the pmap to provide pre-zeroed pages, and it will not provide very many relative to typical application usage. Pages allocated from superpage reservations are unlikely to be pre-zeroed. Finally, in principle the page will be warm in the data caches if it is zeroed on demand, while with a pre-zeroed page this is less likely.
Feb 9 2021
Jan 21 2021
Jan 20 2021
Jan 19 2021
Jan 18 2021
Similarly, there is no reason to define VM_KMEM_SIZE_MIN.
Jan 15 2021
In D27956#627791, @kib wrote:In D27956#627736, @markj wrote:Do riscv or arm64 have the same bug?
No, this bug was introduced with LA57 rewrite of pmap_allocpte(). riscv and arm64 forked pmap.c before LA57.
Jan 14 2021
Jan 13 2021
Jan 12 2021
Jan 11 2021
Jan 10 2021
Jan 9 2021
Jan 6 2021
That's all that I have.