Page MenuHomeFreeBSD

alc (Alan Cox)
User

Projects

User Details

User Since
Dec 14 2014, 5:52 AM (340 w, 4 h)

Recent Activity

Tue, Jun 8

alc added inline comments to D30442: mprotect.2: Update text for largepages.
Tue, Jun 8, 11:07 PM

Sun, Jun 6

alc added a comment to D30644: riscv: Handle hardware-managed dirty bit updates in pmap_promote_l2().

Wes, could you please check this does indeed fix the panic you were seeing with snmalloc, not just your hand-written test?

Sun, Jun 6, 7:15 PM
alc added a comment to D30643: arm64: Fix pmap_copy()'s handling of 2MB mappings.
In D30643#688425, @kib wrote:

Don't we mark the copied mapping as clean to avoid unnecessary writes? Suppose that the source mapping is destroyed, and backing pages are marked dirty and written to the storage. Now, if the copied mapping is destroyed without ever being written to, we would re-dirty and write them again.

Sun, Jun 6, 7:02 PM
alc accepted D30643: arm64: Fix pmap_copy()'s handling of 2MB mappings.
Sun, Jun 6, 6:20 PM
alc accepted D30642: arm64: Use the right PTE when downgrading perms in pmap_promote_l2().
Sun, Jun 6, 6:15 PM
alc accepted D30644: riscv: Handle hardware-managed dirty bit updates in pmap_promote_l2().
Sun, Jun 6, 7:57 AM

Sat, Jun 5

alc added a comment to D30643: arm64: Fix pmap_copy()'s handling of 2MB mappings.

You wrote, "Modify pmap_copy() to make new 2MB mappings read-only, like we do on amd64. I am not sure though why we shouldn't simply copy the dirty bit over to the child."

Sat, Jun 5, 7:09 PM
alc added a comment to D30642: arm64: Use the right PTE when downgrading perms in pmap_promote_l2().

As an aside, I would probably replicate the amd64 comments that we do not need to perform a TLB invalidation in this case.

Sat, Jun 5, 6:51 PM
alc accepted D30642: arm64: Use the right PTE when downgrading perms in pmap_promote_l2().
Sat, Jun 5, 6:46 PM

Sun, May 30

alc added a comment to D30442: mprotect.2: Update text for largepages.
In D30442#683738, @kib wrote:
In D30442#683737, @jhb wrote:

Won't we demote large pages if the protections of sub-pages differ? I don't think we (yet) have a flag to make mappings use a minimum page size (e.g. to force the use of 2MB pages and not permit demoting to 4K pages) for which the effect would then be as Brooks' new sentence describes.

superpages != largepages. Largepages are relatively new thing, for them we guarantee that

  1. backing object is populated with contiguous pages suitable for superpage mapping
  2. userspace maping is always done with superpages PTEs
  3. we do not allow to clip them not at superpage boundary
Sun, May 30, 7:20 AM
alc accepted D30442: mprotect.2: Update text for largepages.
Sun, May 30, 6:41 AM

Mar 25 2021

alc added inline comments to D29417: amd64: Implement a KASAN shadow map.
Mar 25 2021, 5:28 AM

Mar 16 2021

alc added a comment to D28805: vm: Handle VM_ALLOC_ZERO in the page allocator.
In D28805#645337, @alc wrote:

I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist().
Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.

So to be clear, your proposal is to add a vm_page_alloc_anon() (or _noobj()?) which accepts a VM_ALLOC_ZERO flag and implements it by zeroing the page, whereas vm_page_alloc(_contig)() should stop taking VM_ALLOC_ZERO and instead return PG_ZERO unchanged?

Yes, and vm_page_alloc_noobj() makes sense to me. (I think that vm_page_alloc_anon() could be too easily confused with allocation of pages to OBJ_ANON vm objects. In other words, I would avoid using any derivatives of the word anonymous.)

I like the idea of splitting the allocator functions and preserving PG_ZERO. It feels a bit odd to have inconsistent handling with respect to VM_ALLOC_ZERO though. There are situations where allocations are rare enough that zeroing under the object lock is not a problem (or it is required), and splitting the allocator entry points would make it easier to spot calls where it is likely to be a problem. The pti_obj object used to manage userspace-visible "holes" in the kernel address space is an example of this.

I'm not going to argue strenuously for leaving VM_ALLOC_ZERO support out of vm_page_alloc{,_contig}(). I agree that splitting the allocator entry points will make it easier to spot calls that shouldn't use VM_ALLOC_ZERO.

Mar 16 2021, 1:35 AM

Mar 11 2021

alc accepted D29203: vm_reserv: Fix list locking in vm_reserv_reclaim_contig().
Mar 11 2021, 4:28 AM
alc added inline comments to D29203: vm_reserv: Fix list locking in vm_reserv_reclaim_contig().
Mar 11 2021, 4:02 AM
alc accepted D29203: vm_reserv: Fix list locking in vm_reserv_reclaim_contig().
Mar 11 2021, 12:15 AM

Mar 2 2021

alc added a comment to D28924: vm: Round up npages and alignment for contig reclamation.
In D28924#649276, @mav wrote:

it takes about 8s to allocate 100 clusters on a system with 64GB, vs. 2-2.5s with the patch applied.

It is good to hear, but still does not sound realistic for networking purposes. Plus my systems often have 256GB or more memory. Have you tried it together with your origial optimization patch?

Right, this not expected to be a full solution to the problem. I will look more at preferentially reclaiming from the phys_segs corresponding to the default freelists, and ending the scan earlier.

I am wondering if the intent behind the current implementation is to provide a consistent runtime for reclamation. Suppose we started scanning from the beginning of physical memory and over time reclaimed more and more runs. Subsequent scans will take longer and longer since they always start from the same place. Perhaps we could maintain some cursor that gets updated after a scan and is used to mark the beginning of subsequent scans.

Mar 2 2021, 8:23 PM

Feb 28 2021

alc added a comment to D28924: vm: Round up npages and alignment for contig reclamation.

mav@ could you please give your hack/test case to markj@. There should be a significant reduction in the amount scanning with this patch.

Feb 28 2021, 9:52 PM
alc accepted D28924: vm: Round up npages and alignment for contig reclamation.
Feb 28 2021, 9:48 PM
alc added inline comments to D28924: vm: Round up npages and alignment for contig reclamation.
Feb 28 2021, 9:08 PM

Feb 25 2021

alc added a comment to D28924: vm: Round up npages and alignment for contig reclamation.

The rounding up should be capped at the largest supported buddy list order. (For allocation requests that are larger than the largest supported order, we do the following: For each block in the largest order list, we look at its successors in the vm_page array to see if a sufficient number of them are free to satisfy the request.)

Feb 25 2021, 7:17 PM

Feb 20 2021

alc added a comment to D28805: vm: Handle VM_ALLOC_ZERO in the page allocator.

I really have mixed feelings about this change. If anything, we should try to discourage page zeroing while the object lock is held. On the other hand, for VM_ALLOC_NOOBJ allocations, I think that this change makes perfect sense, which brings me to the following proposal: Remove VM_ALLOC_NOOBJ and VM_ALLOC_ZERO from vm_page_alloc{,_contig}(), and provide separate allocation functions to replace VM_ALLOC_NOOBJ, akin to vm_page_alloc_freelist(). Virtually all VM_ALLOC_NOOBJ call sites are being changed, so we as well change the function being called. This will slightly simplify vm_page_alloc(), e.g., it can assume that the object is always non-NULL and that it will only allocate from the default pool. Also, when I say, "Remove ... VM_ALLOC_ZERO from vm_page_alloc()", I would probably let PG_ZERO pass through vm_page_alloc() unchanged. Right now, we clear that flag unless VM_ALLOC_ZERO was specified. As for the functions that replace VM_ALLOC_NOOBJ, I would drop not only the object parameter, but also the pindex. The few callers that (ab)use the pindex can set it themselves.

Feb 20 2021, 11:05 PM
alc added inline comments to D28810: Let the VM page allocator handle page zeroing.
Feb 20 2021, 10:48 PM
alc added a comment to D28807: vm fault: Adapt to new VM_ALLOC_ZERO semantics.
In D28807#644985, @kib wrote:

Still, you might add a vm_page_alloc() flag that would ask to not clear PG_ZERO on return, making it the duty of the caller. Then vm_fault() could utilize it to preserve the optimization. Could it be useful for 32bit machines?

I considered it and was looking for i386 systems in the cluster so I can check the v_zfod and v_ozfod counter values. I couldn't find any though, so I will look at a VM soon and try some simple loads to see if it is worth preserving. My suspicion is that it is still a minor optimization even on 32-bit systems since we are relying on the pmap to provide pre-zeroed pages, and it will not provide very many relative to typical application usage. Pages allocated from superpage reservations are unlikely to be pre-zeroed. Finally, in principle the page will be warm in the data caches if it is zeroed on demand, while with a pre-zeroed page this is less likely.

Feb 20 2021, 8:19 PM
alc added inline comments to D28805: vm: Handle VM_ALLOC_ZERO in the page allocator.
Feb 20 2021, 7:12 PM
alc added inline comments to D28806: x86/iommu: Update following VM_ALLOC_ZERO semantic change.
Feb 20 2021, 7:01 PM

Feb 9 2021

alc accepted D28555: vm: Honour the "noreuse" flag to vm_page_unwire_managed().
Feb 9 2021, 5:40 PM

Jan 21 2021

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 21 2021, 7:13 PM
alc accepted D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 21 2021, 7:08 PM
alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 21 2021, 7:45 AM

Jan 20 2021

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 20 2021, 8:10 AM
alc accepted D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 20 2021, 2:54 AM

Jan 19 2021

alc accepted D28225: Set VM_KMEM_SIZE_SCALE to 1 on riscv and arm64.
Jan 19 2021, 6:54 AM

Jan 18 2021

alc added inline comments to D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 18 2021, 11:10 PM
alc accepted D28225: Set VM_KMEM_SIZE_SCALE to 1 on riscv and arm64.

Similarly, there is no reason to define VM_KMEM_SIZE_MIN.

Jan 18 2021, 11:00 PM
alc accepted D28219: Restrict supported alignment for malloc_domainset_aligned(9) to PAGE_SIZE..
Jan 18 2021, 10:26 PM

Jan 15 2021

alc added a comment to D27956: amd64 pmap: do not sleep in _pmap_allocpte() with zero referenced page table page..
In D27956#627791, @kib wrote:

Do riscv or arm64 have the same bug?

No, this bug was introduced with LA57 rewrite of pmap_allocpte(). riscv and arm64 forked pmap.c before LA57.

Jan 15 2021, 8:07 PM

Jan 14 2021

alc added inline comments to D28117: vm_map_protect: allow to set prot and max_prot in one go..
Jan 14 2021, 7:54 AM

Jan 13 2021

alc added inline comments to D28117: vm_map_protect: allow to set prot and max_prot in one go..
Jan 13 2021, 2:16 AM

Jan 12 2021

alc added inline comments to D28050: Implement enforcing write XOR execute mapping policy..
Jan 12 2021, 8:17 AM

Jan 11 2021

alc accepted D28102: amd64: compare TLB shootdown target to all_cpus.
Jan 11 2021, 10:13 PM

Jan 10 2021

alc committed R10:5a181b8bce99: Prefer the use of vm_page_domain() to vm_phys_domain(). (authored by alc).
Prefer the use of vm_page_domain() to vm_phys_domain().
Jan 10 2021, 7:29 PM
alc closed D28005: Prefer the use of vm_page_domain() to vm_phys_domain().
Jan 10 2021, 7:29 PM

Jan 9 2021

alc accepted D25815: amd64 pmap: add comment explaining TLB invalidation modes..
Jan 9 2021, 11:56 PM
alc added inline comments to D25815: amd64 pmap: add comment explaining TLB invalidation modes..
Jan 9 2021, 11:36 PM
alc added inline comments to D25815: amd64 pmap: add comment explaining TLB invalidation modes..
Jan 9 2021, 11:28 PM
alc added inline comments to D25815: amd64 pmap: add comment explaining TLB invalidation modes..
Jan 9 2021, 8:48 PM
alc added inline comments to D25815: amd64 pmap: add comment explaining TLB invalidation modes..
Jan 9 2021, 7:55 PM

Jan 6 2021

alc requested review of D28005: Prefer the use of vm_page_domain() to vm_phys_domain().
Jan 6 2021, 7:10 PM
alc added a comment to D25815: amd64 pmap: add comment explaining TLB invalidation modes..

That's all that I have.

Jan 6 2021, 6:40 PM

Jan 5 2021

alc added inline comments to D25815: amd64 pmap: add comment explaining TLB invalidation modes..
Jan 5 2021, 7:32 PM

Jan 4 2021

alc added inline comments to D27956: amd64 pmap: do not sleep in _pmap_allocpte() with zero referenced page table page..
Jan 4 2021, 11:57 PM
alc committed R10:7beeacb27b27: Honor the vm page's PG_NODUMP flag on arm and i386. (authored by alc).
Honor the vm page's PG_NODUMP flag on arm and i386.
Jan 4 2021, 10:17 PM
alc closed D27949: Honor the PG_NODUMP flag in is_dumpable() on arm and i386.
Jan 4 2021, 10:17 PM
alc requested review of D27949: Honor the PG_NODUMP flag in is_dumpable() on arm and i386.
Jan 4 2021, 7:34 AM

Jan 2 2021

alc accepted D27885: uma: Avoid unmapping ranges in the direct map.
Jan 2 2021, 1:42 AM

Jan 1 2021

alc accepted D27885: uma: Avoid unmapping ranges in the direct map.
Jan 1 2021, 11:17 PM

Dec 17 2020

alc accepted D27607: Fix some errors in the page busying code.
Dec 17 2020, 7:21 PM

Dec 14 2020

alc accepted D27588: amd64 pmap: fix pcid invalidations.
Dec 14 2020, 10:20 PM
alc accepted D27588: amd64 pmap: fix pcid invalidations.

I have nothing more to add. Once you've dealt with Mark's latest comment, commit it. I don't see the need for me to look at this change again.

Dec 14 2020, 8:03 PM

Dec 13 2020

alc added inline comments to D27588: amd64 pmap: fix pcid invalidations.
Dec 13 2020, 9:22 PM
alc added inline comments to D27588: amd64 pmap: fix pcid invalidations.
Dec 13 2020, 8:14 PM
alc committed R9:a9e040146ef6: Oops. Add the semicolon that was missing from the (one) entry (authored by alc).
Oops. Add the semicolon that was missing from the (one) entry
Dec 13 2020, 7:55 PM
alc committed R9:071d5b732853: Finally, ... add myself to handbook. (The old handbook was frozen (authored by alc).
Finally, ... add myself to handbook. (The old handbook was frozen
Dec 13 2020, 7:55 PM
alc committed R9:5c13ddd0f846: The conversion of vm_map's lockmgr()-based locks from a shared/exclusive (authored by alc).
The conversion of vm_map's lockmgr()-based locks from a shared/exclusive
Dec 13 2020, 7:23 PM

Dec 12 2020

alc added inline comments to D27588: amd64 pmap: fix pcid invalidations.
Dec 12 2020, 10:58 PM
alc added inline comments to D27588: amd64 pmap: fix pcid invalidations.
Dec 12 2020, 9:01 PM

Dec 2 2020

alc accepted D27082: minidumps: Always use 64-bit physical addresses for dump_avail[].
Dec 2 2020, 6:40 PM

Dec 1 2020

alc added a comment to D27368: Bump MAXMEMDOM value to 8 to match amd64.

I'd be interested in seeing the output of sysctl vm.phys_segs on one of these machines. Can you post it here?

Dec 1 2020, 8:18 PM
alc added inline comments to D27082: minidumps: Always use 64-bit physical addresses for dump_avail[].
Dec 1 2020, 7:52 PM

Nov 29 2020

alc accepted D27409: bio aio: Destroy ephemeral mapping before unwiring page..
Nov 29 2020, 12:03 AM

Nov 28 2020

alc added inline comments to D27225: Make MAXPHYS tunable..
Nov 28 2020, 8:06 PM

Nov 19 2020

alc accepted D27207: vm_phys: Try to clean up NUMA KPIs.
Nov 19 2020, 12:24 AM
alc added inline comments to D27207: vm_phys: Try to clean up NUMA KPIs.
Nov 19 2020, 12:18 AM
alc added inline comments to D27207: vm_phys: Try to clean up NUMA KPIs.
Nov 19 2020, 12:03 AM

Nov 13 2020

alc added a comment to D27207: vm_phys: Try to clean up NUMA KPIs.

I ran into a similar problem with respect to vm_phys_segs[] in eliminating iteration from the vm_dumpset operations. I wonder if we shouldn't separate the struct vm_phys_seg and array declarations into their own separate header file that is included by vm_param.h. Then, vm_page.h could implement vm_page_domain().

Nov 13 2020, 6:53 PM

Nov 8 2020

alc added a comment to D25815: amd64 pmap: add comment explaining TLB invalidation modes..

Here is my first batch of comments.

Nov 8 2020, 8:26 PM

Nov 3 2020

alc accepted D27057: vmspace: Convert to refcount(9).
Nov 3 2020, 6:02 PM

Nov 2 2020

alc closed D27052: Tidy up the #includes in uma_machdep.c.
Nov 2 2020, 7:20 PM
alc committed rS367281: Tidy up the #includes. Recent changes, such as the introduction of.
Tidy up the #includes. Recent changes, such as the introduction of
Nov 2 2020, 7:20 PM
alc requested review of D27052: Tidy up the #includes in uma_machdep.c.
Nov 2 2020, 8:33 AM

Oct 27 2020

alc closed D26908: mmap(2): Clarify that guard is taken from the stack region..
Oct 27 2020, 6:09 PM
alc committed rS367087: Revise the description of MAP_STACK. In particular, describe the guard.
Revise the description of MAP_STACK. In particular, describe the guard
Oct 27 2020, 6:09 PM
alc added inline comments to D26908: mmap(2): Clarify that guard is taken from the stack region..
Oct 27 2020, 5:40 PM

Oct 26 2020

alc added inline comments to D26910: Make pmap_invalidate_ept() wait synchronously for guest exits.
Oct 26 2020, 6:48 PM
alc updated the diff for D26908: mmap(2): Clarify that guard is taken from the stack region..
Oct 26 2020, 4:52 AM
alc commandeered D26908: mmap(2): Clarify that guard is taken from the stack region..
Oct 26 2020, 4:51 AM

Oct 25 2020

alc added a comment to D26908: mmap(2): Clarify that guard is taken from the stack region..

After mulling this over for the past day or so, I'd like to propose the following alternative description:

Oct 25 2020, 12:06 AM

Oct 23 2020

alc added a comment to D26923: vm_map: Add fences around pmap rundown.

I'm afraid that I can't make sense out of the summary. Is this a problem with the KASSERT that precedes the pmap lock acquire in arm64's pmap_remove_pages()? Omce pmap_remove_pages() has performed the lock acquire, prior changes to the pmap should be visible to it.

Oct 23 2020, 10:46 PM
alc added inline comments to D26908: mmap(2): Clarify that guard is taken from the stack region..
Oct 23 2020, 5:29 PM
alc closed D26907: Conditionally compile struct vm_phys_seg's md_first field.
Oct 23 2020, 6:25 AM
alc committed rS366960: Conditionally compile struct vm_phys_seg's md_first field. This field is.
Conditionally compile struct vm_phys_seg's md_first field. This field is
Oct 23 2020, 6:25 AM

Oct 22 2020

alc added inline comments to D26894: mmap(2): Document guard size and related EINVAL..
Oct 22 2020, 6:16 PM
alc requested review of D26907: Conditionally compile struct vm_phys_seg's md_first field.
Oct 22 2020, 5:59 PM
alc closed D26876: Micro-optimize arm64's uma_small_alloc().
Oct 22 2020, 5:48 PM
alc committed rS366944: Micro-optimize uma_small_alloc(). Replace bzero(..., PAGE_SIZE) by.
Micro-optimize uma_small_alloc(). Replace bzero(..., PAGE_SIZE) by
Oct 22 2020, 5:48 PM

Oct 20 2020

alc added a comment to D26876: Micro-optimize arm64's uma_small_alloc().

Most other implementations also use bzero(), though on them pagezero() is nothing more than bzero(va, PAGE_SIZE) anyway.

Oct 20 2020, 2:54 AM

Oct 19 2020

alc requested review of D26876: Micro-optimize arm64's uma_small_alloc().
Oct 19 2020, 11:38 PM

Oct 17 2020

alc accepted D26802: link_elf_obj: Colour VM objects.

A while back, Intel quietly made it possible to measure address translation overhead for instruction accesses. I say, "quietly," because the manual provides a low-level description of the counter that doesn't really explain what it effectively measures. However, some Intel people gave a presentation at an HPC workshop about 2 years ago that explained the counter's meaning, and Intel published those slides here: https://software.intel.com/content/www/us/en/develop/download/how-top-down-microarchitecture-analysis-tma-addresses-challenges-in-modern-servers.html

Oct 17 2020, 7:04 PM

Oct 16 2020

alc accepted D26772: uma: Respect uk_reserve in keg_drain().
Oct 16 2020, 9:09 PM