Page MenuHomeFreeBSD
Feed Advanced Search

Sep 26 2023

alc committed rG902ed64fecbe: i386 pmap: Adapt recent amd64/arm64 superpage improvements (authored by alc).
i386 pmap: Adapt recent amd64/arm64 superpage improvements
Sep 26 2023, 5:42 PM

Sep 25 2023

alc added a comment to D38852: vm: improve kstack_object pindex calculation scheme to avoid pindex holes.

From Ryzen 3000 onward, AMD's MMU automatically promotes four contiguous 16KB-aligned virtual pages mapping four contiguous 16KB-aligned physical pages, creating a single 16KB TLB entry. With this change, the physical contiguity and alignment is guaranteed. Could we dynamically adjust the guard size at runtime to achieve the required virtual alignment?

Sep 25 2023, 6:38 AM
alc added a comment to D38852: vm: improve kstack_object pindex calculation scheme to avoid pindex holes.
In D38852#956753, @kib wrote:

So what is the main purpose of this change? To get linear, without gaps, pindexes for the kernel stack obj?

Sep 25 2023, 6:17 AM

Sep 24 2023

alc updated the diff for D41944: i386 pmap: catch up with amd64 superpage improvements.

Retire PMAP_INLINE. It's unused.

Sep 24 2023, 9:14 PM

Sep 23 2023

alc updated the diff for D41944: i386 pmap: catch up with amd64 superpage improvements.

pmap: optimize MADV_WILLNEED on existing superpages

Sep 23 2023, 7:25 AM

Sep 22 2023

alc requested review of D41944: i386 pmap: catch up with amd64 superpage improvements.
Sep 22 2023, 6:36 PM
alc added a comment to D41635: i386 pmap: allocate leaf page table page for wired userspace superpages.
In D41635#955565, @bojan.novkovic_fer.hr wrote:

Address @alc 's comments.

Sep 22 2023, 6:21 PM

Sep 18 2023

alc added a comment to D41635: i386 pmap: allocate leaf page table page for wired userspace superpages.

@markj I think that you should apply 34eeabff5a8636155bb02985c5928c1844fd3178 to i386 and riscv. Otherwise, I suspect that there will be assertion failures in places like pmap_remove_pde():

KASSERT(vm_page_all_valid(mpte),
    ("pmap_remove_pde: pte page not promoted"));
Sep 18 2023, 8:11 AM
alc accepted D41634: arm64 pmap: allocate leaf page table page for wired userspace superpages.
Sep 18 2023, 7:58 AM
alc requested changes to D41635: i386 pmap: allocate leaf page table page for wired userspace superpages.
Sep 18 2023, 7:27 AM

Sep 13 2023

alc accepted D41846: powerpc pmap: initialize kernel pmap radix trie.
Sep 13 2023, 6:16 PM

Sep 11 2023

alc accepted D41344: radix_trie: have vm_radix use pctrie code.
Sep 11 2023, 6:51 AM
alc added a comment to D41344: radix_trie: have vm_radix use pctrie code.
In D41344#946751, @kib wrote:

I would appreciate feedback here. I'm afraid that the only (unwritten, verbal) feedback I have so far is that this change is only acceptable if I strip all the code out of subr_pctrie.c and inline it in pctrie.h so that all that code can be generated over again for every user of pctrie.h. Is that a consensus position (of everyone but me)?

Could you elaborate on the motivation of that answer, please? For me it sounds fine to share pctrie code as functions.

Sep 11 2023, 12:05 AM

Sep 9 2023

alc added a comment to D39845: VM anonymous clustering: be more persistent.

Let me say up front, that the worst of the address space creep has been addressed by the changes in place. However, I still see some behavior that I think should be changed. I'll elaborate on that later, but first I have a question: @kib, it has never been clear to me why we update anon_loc. Is it meant solely as an optimization to free address space finding, or is it done for some other reason?

Sep 9 2023, 7:35 PM
alc accepted D41132: amd64 pmap: allocate leaf page table page for wired userspace 2M pages.
Sep 9 2023, 6:19 PM
alc added a comment to D41132: amd64 pmap: allocate leaf page table page for wired userspace 2M pages.

This looks correct now. Thanks!

Sep 9 2023, 5:59 PM

Sep 3 2023

alc added a comment to D35709: fix pmcstat .
In D35709#950480, @alc wrote:

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Thanks Alan, I have found the same, and I have a fix for it. The problem is that we now allocate the requested event twice on CPU 0, thus reducing the total number of available counters by two.

I will put the fix up for review within the next week, and make sure it is present in 14.0.

Sep 3 2023, 9:14 PM
alc added a comment to D35709: fix pmcstat .

One of my graduate students found that this change had a seriously bad side effect. Specifically, on a Ryzen processor, instead of being able to collect data from 6 counters simultaneously, he could only configure 3 counters. So, we backed out this change locally.

Sep 3 2023, 8:52 PM

Aug 12 2023

alc accepted D41435: vm: Allow MAP_32BIT for all architectures.
Aug 12 2023, 8:48 PM
alc added a comment to D41344: radix_trie: have vm_radix use pctrie code.

I can foresee using the vm_page's order field to encode an order, and thereby eliminate a level in the radix trie. Specifically, I believe that we could encode the order for allocated vm_pages as VM_NFREEORDER + the order

Then you would want to change the implementation of vm_radix_lookup to call pctrie_lookup_le, and use the order field and pindex of the found rnode to determine whether your lookup succeeded or not. You might also want to change vm_radix_insert, so that when you inserted that 16th page, vm_radix_insert could remove all the little pages that would be covered by the new big page, before inserting the new big page. And you might want to change vm_radix_remove too, but all those changes would involve calling pctrie_insert and pctrie_remove as necessary. And maybe the pctrie people could implement some kind of 'reclaim everything in this range' call, so that vm_radix could do it faster, somehow. I'm sure the fine people who work on the pctrie project would be happy to help, once you define what you want from them.

Aug 12 2023, 7:59 PM
alc added inline comments to D41132: amd64 pmap: allocate leaf page table page for wired userspace 2M pages.
Aug 12 2023, 7:19 PM
alc added a comment to D41344: radix_trie: have vm_radix use pctrie code.

Suppose that we have 16 physically contiguous pages on amd64/arm64. I can foresee using the vm_page's order field to encode an order, and thereby eliminate a level in the radix trie. Specifically, I believe that we could encode the order for allocated vm_pages as VM_NFREEORDER + the order, and tweak the assertions in vm_phys.c to test for >= VM_NFREEORDER, rather than equality.

Aug 12 2023, 5:52 PM
alc added inline comments to D41435: vm: Allow MAP_32BIT for all architectures.
Aug 12 2023, 5:35 PM
alc committed rG37e5d49e1e5e: vm: Fix address hints of 0 with MAP_32BIT (authored by alc).
vm: Fix address hints of 0 with MAP_32BIT
Aug 12 2023, 7:57 AM
alc closed D41397: vm: Fix address hints of 0 with MAP_32BIT.
Aug 12 2023, 7:57 AM
alc added inline comments to D41397: vm: Fix address hints of 0 with MAP_32BIT.
Aug 12 2023, 7:33 AM

Aug 10 2023

alc accepted D41099: More fixes for stacks.
Aug 10 2023, 4:54 PM
alc accepted D41099: More fixes for stacks.
Aug 10 2023, 7:23 AM
alc added inline comments to D41099: More fixes for stacks.
Aug 10 2023, 6:03 AM
alc added inline comments to D41397: vm: Fix address hints of 0 with MAP_32BIT.
Aug 10 2023, 12:22 AM

Aug 9 2023

alc updated the summary of D41397: vm: Fix address hints of 0 with MAP_32BIT.
Aug 9 2023, 8:01 PM
alc updated the summary of D41397: vm: Fix address hints of 0 with MAP_32BIT.
Aug 9 2023, 8:00 PM
alc requested review of D41397: vm: Fix address hints of 0 with MAP_32BIT.
Aug 9 2023, 6:14 PM

Aug 1 2023

alc accepted D41249: vm_map: Add a macro to fetch a map entry's split boundary index.
Aug 1 2023, 5:38 AM

Jul 30 2023

alc accepted D41235: radix_tree: compute slot from keybarr.
Jul 30 2023, 7:26 PM
alc added inline comments to D41099: More fixes for stacks.
Jul 30 2023, 6:06 PM

Jul 29 2023

alc accepted D41230: amd64: Fix TLB invalidation routines in !SMP kernels.
Jul 29 2023, 8:01 PM

Jul 28 2023

alc committed rG3d7c37425ee0: amd64 pmap: Catch up with pctrie changes (authored by alc).
amd64 pmap: Catch up with pctrie changes
Jul 28 2023, 8:18 PM

Jul 27 2023

alc accepted D41171: radix_trie: use null leaves to speed searches.
Jul 27 2023, 6:54 PM
alc added a comment to D41171: radix_trie: use null leaves to speed searches.

As expected, this reduces the number of instructions in _vm_page_lookup by two. Before:

1f0: 48 89 f2                      movq    %rsi, %rdx
1f3: 48 d3 ea                      shrq    %cl, %rdx
1f6: 83 e2 0f                      andl    $0xf, %edx
1f9: 48 8b 44 d0 10                movq    0x10(%rax,%rdx,8), %rax
1fe: 48 85 c0                      testq   %rax, %rax
201: 74 34                         je      0x237 <vm_radix_lookup+0x57>
203: a8 01                         testb   $0x1, %al
205: 75 26                         jne     0x22d <vm_radix_lookup+0x4d>
207: 0f b6 50 0a                   movzbl  0xa(%rax), %edx
20b: 48 8d 0c 95 00 00 00 00       leaq    (,%rdx,4), %rcx
213: 48 83 fa 0e                   cmpq    $0xe, %rdx
217: 77 d7                         ja      0x1f0 <vm_radix_lookup+0x10>
219: 48 c7 c2 f0 ff ff ff          movq    $-0x10, %rdx
220: 48 d3 e2                      shlq    %cl, %rdx
223: 48 21 f2                      andq    %rsi, %rdx
226: 48 3b 10                      cmpq    (%rax), %rdx
229: 74 c5                         je      0x1f0 <vm_radix_lookup+0x10>

After:

2e0: 48 89 f2                      movq    %rsi, %rdx
2e3: 48 d3 ea                      shrq    %cl, %rdx
2e6: 83 e2 0f                      andl    $0xf, %edx
2e9: 48 8b 44 d0 10                movq    0x10(%rax,%rdx,8), %rax
2ee: a8 01                         testb   $0x1, %al
2f0: 75 26                         jne     0x318 <vm_radix_lookup+0x48>
2f2: 0f b6 50 0a                   movzbl  0xa(%rax), %edx
2f6: 48 8d 0c 95 00 00 00 00       leaq    (,%rdx,4), %rcx
2fe: 48 83 fa 0e                   cmpq    $0xe, %rdx
302: 77 dc                         ja      0x2e0 <vm_radix_lookup+0x10>
304: 48 c7 c2 f0 ff ff ff          movq    $-0x10, %rdx
30b: 48 d3 e2                      shlq    %cl, %rdx
30e: 48 21 f2                      andq    %rsi, %rdx
311: 48 3b 10                      cmpq    (%rax), %rdx
314: 74 ca                         je      0x2e0 <vm_radix_lookup+0x10>

The effect on arm64 is similar.

Jul 27 2023, 6:12 PM
alc committed rG5ec2d94ade51: vm_mmap_object: Update the spelling of true/false (authored by alc).
vm_mmap_object: Update the spelling of true/false
Jul 27 2023, 5:27 AM

Jul 26 2023

alc added inline comments to D41132: amd64 pmap: allocate leaf page table page for wired userspace 2M pages.
Jul 26 2023, 5:50 PM
alc added inline comments to D41159: mmap(2): fix MAP_32BIT when ASLR is disabled.
Jul 26 2023, 5:45 PM
alc added inline comments to D41132: amd64 pmap: allocate leaf page table page for wired userspace 2M pages.
Jul 26 2023, 5:32 PM
alc added inline comments to D41159: mmap(2): fix MAP_32BIT when ASLR is disabled.
Jul 26 2023, 5:56 AM
alc committed rGa98a0090b2ba: arm64 pmap: Eliminate unnecessary TLB invalidations (authored by alc).
arm64 pmap: Eliminate unnecessary TLB invalidations
Jul 26 2023, 5:38 AM
alc closed D41159: mmap(2): fix MAP_32BIT when ASLR is disabled.
Jul 26 2023, 5:26 AM
alc committed rG50d663b14b31: vm: Fix vm_map_find_min() (authored by alc).
vm: Fix vm_map_find_min()
Jul 26 2023, 5:26 AM

Jul 25 2023

alc added a comment to D41159: mmap(2): fix MAP_32BIT when ASLR is disabled.

I'd like to suggest this patch instead:

diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
index 444e09986d4e..eb607d519247 100644
--- a/sys/vm/vm_map.c
+++ b/sys/vm/vm_map.c
@@ -2255,10 +2255,10 @@ vm_map_find_min(vm_map_t map, vm_object_t object, vm_ooffset_t offset,
        int rv;
Jul 25 2023, 8:08 AM

Jul 24 2023

alc committed rG7b1e606c7222: arm64 pmap: Retire PMAP_INLINE (authored by alc).
arm64 pmap: Retire PMAP_INLINE
Jul 24 2023, 6:19 PM
alc committed rG0aebcfc9f4d6: arm64 pmap: Eliminate some duplication of code (authored by alc).
arm64 pmap: Eliminate some duplication of code
Jul 24 2023, 6:19 PM

Jul 22 2023

alc closed D41118: arm64/riscv pmap: Initialize the pmap's pm_pvchunk field.
Jul 22 2023, 4:59 AM
alc committed rG29edff0dea0f: arm64/riscv pmap: Initialize the pmap's pm_pvchunk field (authored by alc).
arm64/riscv pmap: Initialize the pmap's pm_pvchunk field
Jul 22 2023, 4:59 AM

Jul 20 2023

alc requested review of D41118: arm64/riscv pmap: Initialize the pmap's pm_pvchunk field.
Jul 20 2023, 5:29 PM
alc accepted D41089: mmap(MAP_STACK): on stack grow, use original protection.

I'm okay with this. I just reread the mmap(2) description of MAP_STACK, and I don't think that it needs to change to explicitly mention the behavior implemented here. This change implements the behavior that I believe that a reader would most likely expect.

Jul 20 2023, 6:38 AM

Jul 19 2023

alc added a comment to D41089: mmap(MAP_STACK): on stack grow, use original protection.

I think that the overloading of the vm map entry fields should be mentioned where the structure is defined.

Jul 19 2023, 3:48 PM

Jul 18 2023

alc accepted D40936: radix_trie: simplify ge, le lookups.

original timing:
52450.199u 1452.606s 59:09.38 1518.6% 73478+3095k 121089+33120io 110807pf+0w
modified timing:
52599.911u 1454.881s 59:17.88 1519.2% 73479+3096k 121017+34297io 110758pf+0w

Do you have any idea why the timings are so different here? Is the difference consistent across multiple runs?

I ran 3 more buildworld tests on the modified and unchanged kernels immediately after reboots, with no instrumentation
added to count performance stats for any particular function.
The results:

Modified:
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52357.285u 1448.444s 59:11.33 1515.0% 73477+3095k 120995+33438io 110815pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52553.118u 1456.257s 59:19.67 1517.2% 73472+3096k 121217+33609io 110821pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52547.476u 1452.889s 59:17.54 1517.9% 73470+3095k 121178+33245io 110870pf+0w

Original:
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52602.552u 1452.130s 59:20.49 1518.1% 73477+3096k 121068+33157io 110855pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52656.392u 1451.635s 59:27.48 1516.7% 73471+3096k 121118+33354io 110845pf+0w
root@108-254-203-203:/usr/src # time make -j16 buildworld > & /dev/null
52601.014u 1445.216s 59:23.55 1516.6% 73478+3095k 121121+33611io 110844pf+0w

Jul 18 2023, 5:34 PM

Jul 17 2023

alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 17 2023, 5:28 PM

Jul 16 2023

alc accepted D41055: riscv pmap: another vm_radix_init.
Jul 16 2023, 8:43 PM
alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 16 2023, 6:38 PM
alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 16 2023, 7:15 AM
alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 16 2023, 6:49 AM

Jul 15 2023

alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 15 2023, 8:02 PM
alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 15 2023, 7:59 PM
alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 15 2023, 7:57 PM
alc added inline comments to D40936: radix_trie: simplify ge, le lookups.
Jul 15 2023, 7:42 PM

Jul 14 2023

alc accepted D40971: vm_radix_init: use initializer.
Jul 14 2023, 5:11 AM
alc added inline comments to D40971: vm_radix_init: use initializer.
Jul 14 2023, 5:10 AM
alc added inline comments to D40971: vm_radix_init: use initializer.
Jul 14 2023, 4:32 AM

Jul 12 2023

alc committed rG294c52d969df: amd64 pmap: Fix compilation when superpage reservations are disabled (authored by yufeng.zhou_rice.edu).
amd64 pmap: Fix compilation when superpage reservations are disabled
Jul 12 2023, 5:10 PM

Jul 9 2023

alc accepted D40807: radix_trie: avoid code duplication in insert.
Jul 9 2023, 8:00 PM

Jul 7 2023

alc accepted D40775: radix_trie: replace node count with popmap.
Jul 7 2023, 4:01 PM
alc added inline comments to D40775: radix_trie: replace node count with popmap.
Jul 7 2023, 8:16 AM

Jul 5 2023

alc added inline comments to D40775: radix_trie: replace node count with popmap.
Jul 5 2023, 5:45 PM
alc added inline comments to D40775: radix_trie: replace node count with popmap.
Jul 5 2023, 5:43 PM

Jul 4 2023

alc added inline comments to D40775: radix_trie: replace node count with popmap.
Jul 4 2023, 9:12 PM

Jul 2 2023

alc added inline comments to D40775: radix_trie: replace node count with popmap.
Jul 2 2023, 9:55 PM

Jun 29 2023

alc committed rGe59d202312f9: arm64: make VM_NFREEORDER and the comment describing it match (authored by alc).
arm64: make VM_NFREEORDER and the comment describing it match
Jun 29 2023, 6:07 PM
alc closed D40782: arm64: make the setting of VM_NFREEORDER and the comment describing it match.
Jun 29 2023, 6:07 PM
alc updated the diff for D40782: arm64: make the setting of VM_NFREEORDER and the comment describing it match.

Address Andrew's comment.

Jun 29 2023, 6:59 AM

Jun 28 2023

alc added a comment to D40782: arm64: make the setting of VM_NFREEORDER and the comment describing it match.
In D40782#928081, @kib wrote:

Does TLB miss cause cache miss?

Jun 28 2023, 6:59 PM
alc committed rG3767de839742: arm64 pmap: Tidy up pmap_promote_l2() calls (authored by alc).
arm64 pmap: Tidy up pmap_promote_l2() calls
Jun 28 2023, 5:53 PM
alc closed D40781: arm64 pmap: Tidy up pmap_promote_l2() calls.
Jun 28 2023, 5:53 PM
alc requested review of D40782: arm64: make the setting of VM_NFREEORDER and the comment describing it match.
Jun 28 2023, 8:19 AM
alc requested review of D40781: arm64 pmap: Tidy up pmap_promote_l2() calls.
Jun 28 2023, 7:38 AM

Jun 27 2023

alc accepted D40746: radix_trie: skip needless compare in lookup_le, lookup_ge.
Jun 27 2023, 5:33 AM
alc committed rGd8e6f4946cec: vm: Fix anonymous memory clustering under ASLR (authored by alc).
vm: Fix anonymous memory clustering under ASLR
Jun 27 2023, 4:44 AM
alc closed D40743: vm: Fix anonymous memory clustering under ASLR.
Jun 27 2023, 4:44 AM

Jun 26 2023

alc added a comment to D40743: vm: Fix anonymous memory clustering under ASLR.

This change reduces the number of L1 DTLB misses on a Ryzen 5900X during a "make buildkernel" by 14.3%, in large part, because it makes their TLB coalescing feature more effective.

Jun 26 2023, 7:14 PM

Jun 25 2023

alc added inline comments to D40746: radix_trie: skip needless compare in lookup_le, lookup_ge.
Jun 25 2023, 7:36 PM
alc accepted D40722: radix_trie: simplify trimkey functions.
Jun 25 2023, 5:48 PM
alc added inline comments to D40722: radix_trie: simplify trimkey functions.
Jun 25 2023, 12:21 AM

Jun 24 2023

alc committed rG0d2f98c2f092: amd64 pmap: Tidy up pmap_promote_pde() calls (authored by alc).
amd64 pmap: Tidy up pmap_promote_pde() calls
Jun 24 2023, 7:09 PM
alc closed D40744: amd64 pmap: Tidy up pmap_promote_pde() calls.
Jun 24 2023, 7:08 PM
alc added a comment to D40744: amd64 pmap: Tidy up pmap_promote_pde() calls.

Is there any reason not to make the same change on arm64? (Or riscv for that matter.)

Jun 24 2023, 6:01 PM
alc updated the summary of D40744: amd64 pmap: Tidy up pmap_promote_pde() calls.
Jun 24 2023, 6:49 AM
alc requested review of D40744: amd64 pmap: Tidy up pmap_promote_pde() calls.
Jun 24 2023, 6:47 AM
alc updated the summary of D40743: vm: Fix anonymous memory clustering under ASLR.
Jun 24 2023, 4:29 AM
alc updated the summary of D40743: vm: Fix anonymous memory clustering under ASLR.
Jun 24 2023, 4:27 AM

Jun 23 2023

alc requested review of D40743: vm: Fix anonymous memory clustering under ASLR.
Jun 23 2023, 11:36 PM