Page MenuHomeFreeBSD

Non-transparent superpages support.
ClosedPublic

Authored by kib on May 1 2020, 3:53 PM.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Could you upload with context?

sys/amd64/amd64/pmap.c
6070 ↗(On Diff #75222)

I am confused, earlier you wrote:

I just kept the wiring code as is, it bumps the entry user wiring already, and the call to vm_fault() populates the page tables. On unwiring, we need to skip pmap_unwire() and vm_object_unwire().

and vm_map_entry_unwire() still skips pmap_unwire() for large pages.

sys/vm/vm_map.c
3096 ↗(On Diff #75805)

I don't see the point of assigning rv here, it is assigned again a few lines below.

3103 ↗(On Diff #75805)

I think it should look up the entry containing end - 1. Otherwise the vm_map_clip_end() call might try to clip lentry to a length of 0.

sys/vm/vm_map.c
3113 ↗(On Diff #75805)

I think we need to prohibit VM_INHERIT_COPY and perhaps _ZERO for large pages. Currently vmspace_fork() copies the boundary eflags from the old entry. With _COPY inheritance the page fault handler may busy individual 4KB pages in the large page, but because we avoid busying all 4KB pages in a large page there is no synchronization between them.

sys/kern/uipc_shm.c
1053 ↗(On Diff #75805)

We set shmfd->shm_flags = shmflags after every shm_open2(), not just the one that created the object. So, if an object is created with SHM_LARGEPAGE, a subsequent shm_open2() of the object without SHM_LARGEPAGE will clobber the flag.

sys/kern/uipc_shm.c
1053 ↗(On Diff #75805)

Ugh, this is my bug. =(

kib marked 4 inline comments as done.

Switch back to checking shm_object type to detect largepage shm.
Disallow non-shared mappings.
Provide usermode API (shm_open_largepage).

sys/amd64/amd64/pmap.c
6070 ↗(On Diff #75222)

This is still not done, I will integrate your tests with my patch, then I want to think about this some more.

sys/vm/vm_map.c
3113 ↗(On Diff #75805)

Then MAP_PRIVATE should be disabled as well.

sys/amd64/amd64/pmap.c
6070 ↗(On Diff #75222)

I will convert the tests to use the new interface and push some new tests, but it might not be done before tomorrow.

sys/vm/vm_map.c
3113 ↗(On Diff #75805)

Yes, I think so.

There might be a useful semantic for MAP_PRIVATE. Suppose I want to reserve a pool of large pages at boot, before physical memory is fragmented. I can create a set of named shm objects for that purpose, but I can't see a way for applications to atomically "claim" a particular object. Some malicious application could open all of them and map them, so there is no way to ensure that a legitimate consumer gets a private set of large pages from the pool. It would be good to provide some interface that guarantees that a largepage object is not mapped by any other process.

I haven't thought a lot about it so far so I don't want to make any suggestions yet. Maybe I am missing something.

sys/vm/vm_map.c
3103 ↗(On Diff #75805)

I meant that it should be vm_map_lookup_entry(map, end - 1, &lentry).

sys/vm/vm_map.c
3106 ↗(On Diff #75838)

Why not allow VM_INHERIT_NONE?

Fix buildworld.
Minor fixes.
Wiring still not handled.

kib marked 6 inline comments as done.

Allow INHERIT_NONE and INHERIT_ZERO.
Make wiring functional.
CoW mappings are still disabled

sys/kern/uipc_shm.c
788 ↗(On Diff #75892)

This function is called with the object locked, but here we're returning with the object unlocked.

sys/vm/vm_map.c
3106 ↗(On Diff #75892)

I believe VM_INHERIT_ZERO should also be prohibited.

kib marked 2 inline comments as done.Aug 22 2020, 5:26 PM
kib added inline comments.
sys/vm/vm_map.c
3106 ↗(On Diff #75892)

Can you explain why ?

I clear MAP_ENTRY_SPLIT_BOUNDARY_MASK on fork for INHERIT_ZERO.

kib marked an inline comment as done.

Relock on error.

lib/libc/sys/shm_open.c
74 ↗(On Diff #76089)

So in order to open an existing largepage object with shm_open_largepage(), one has to reproduce the original psind, alloc_policy parameters (otherwise FIOSHMLPGCNF will fail), or open it with shm_open(). Does it suggest that shm_open_largepage() should really be shm_create_largepage(), i.e., O_CREAT is implicit, and shm_open() is always used to open existing largepage objects?

sys/vm/vm_map.c
3106 ↗(On Diff #75892)

It is just kind of weird to me - we are mapping and using a shm object because its mappings have some special properties, and after fork() it is replaced by a plain anonymous mapping. I guess INHERIT_ZERO is permitted for vnodes as well, so there is some precedent for this. I'll just modify my test for this case.

kib marked 2 inline comments as done.Aug 23 2020, 9:12 PM
kib added inline comments.
sys/vm/vm_map.c
3106 ↗(On Diff #75892)

But this *is* the semantic of VM_INHERIT_ZERO.

kib marked an inline comment as done.

Merge with LA57 commit.
Rename shm_open_largepage() to shm_create_largepage().
Rename FIOSHMLPGCNF to FIOSSHMLPGCNF.
Add FIOGSHMLPGCNF to fetch shm largepage configuration.

Use convenience macros to calculate pindexes, introduced with LA57 commit.

sys/vm/vm_map.c
3106 ↗(On Diff #75892)

Right, and I think the semantic makes no sense for shared mappings. We already have INHERIT_NONE for them.

Modulo the new comments, I think the diff is reasonable. I would prefer to commit the tests separately. I have a WIP arm64 patch but have not tested yet and it should not block the initial commit. I'm also happy to write the man page modifications if you like. Aside from documentation of the shm interface we should document the new source of EINVAL for various memory system calls.

sys/kern/uipc_shm.c
792 ↗(On Diff #76122)

Need to relock here as well.

sys/vm/vm_mmap.c
426 ↗(On Diff #76122)

The check does not work properly in some cases because shm_flags gets clobbered on every open() as I pointed out before. So either SHM_LARGEPAGE needs to be preserved in kern_shm_open2() or some other method of identifying largepage objects is needed.

usr.bin/posixshmcontrol/posixshmcontrol.c
83 ↗(On Diff #76122)

This is still using the old ioctl. I guess it can use shm_create_largepage() now.

This revision is now accepted and ready to land.Aug 30 2020, 5:52 PM
kib marked 4 inline comments as done.Aug 30 2020, 6:59 PM

I do not intend to commit tests, this is your code.

sys/kern/uipc_shm.c
798 ↗(On Diff #76122)

This should be VM_OBJECT_WLOCK().

This revision now requires review to proceed.Aug 30 2020, 7:35 PM

Fix mis-merge in amd64 pmap_mincore().
Switch posixshmcontrol(1) to libc API.

Tested by: pho

This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2020, 11:28 PM
This revision was automatically updated to reflect the committed changes.

Rest of the patch, with the following fixes:

  1. handle PDP superpages in pmap_extract_and_hold(). actually restructure the function to make if() nesting bearable.
  2. Only allow SHM_LARGEPAGE on amd64, no other pmaps were handled.
  3. Fix tests compilation on mips, where MAXPAGESIZES == 1.
sys/amd64/amd64/pmap.c
3616 ↗(On Diff #76815)

This could be restructured a bit to avoid testing m != NULL after the check_page label:

if ((pdpe & PG_PS) != 0) {
    if ((pdpe & PG_RW) == 0 && (prot & VM_PROT_WRITE) != 0)
        goto out;
    m = PHYS_TO_VM_PAGE(...);
    goto check_page;
}
...
if ((pte & PG_RW) == 0 && (prot & VM_PROT_WRITE) != 0)
    goto out;
m = PHYS_TO_VM_PAGE(pte & PG_FRAME);
check_page:
if (!vm_page_wire_mapped(m))
    m = NULL;
out:
...
kib marked an inline comment as done.Sep 9 2020, 4:19 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
3616 ↗(On Diff #76815)

I restructured as you suggested, but I do not believe that m!=NULL check can be dropped. PHYS_TO_VM_PAGE() can return NULL, for instance imagine that userspace mapped a PCI BAR outside DMAP.

kib marked an inline comment as done.

Restructure pmap_extract_and_hold() more.

I verified that the pmap_extract_and_hold() change fixes my test case.

sys/amd64/amd64/pmap.c
3616 ↗(On Diff #76815)

I see, thanks. I didn't know about PCIOCBARMMAP.

This revision is now accepted and ready to land.Sep 9 2020, 4:50 PM