Details

Reviewers

markj
alc

Commits

rS365525: Add posixshmcontrol(1) support for largepage shm objects.
rS365524: Add shm_create_largepage(3) helper for creation and configuration of
rS365523: Fix compilation of libsysdecode after r365522, should have been part of it.
rS365522: Support for userspace non-transparent superpages (largepages).
rS365520: vm_map: Add a map entry kind that can only be clipped at specific boundary.
rS365518: Add pmap_enter(9) PMAP_ENTER_LARGEPAGE flag and implement it on amd64.
rS365516: Add vm_map_find_aligned(9).
rS365515: Move MAP_32BIT_MAX_ADDR definition to sys/mman.h.
rS365513: Prepare to handle non-trivial errors from vm_map_delete().
rS365512: amd64 pmap: teach functions walking user page tables about PG_PS bit in PDPE.
rS365511: amd64: report support for 1G superpages in getpagesizes(2).
rS365510: uipc_shm.c: Move comment where it belongs.
rS365488: Allow consumer to customize physical pager.
rS365486: Add kern_mmap_racct_check(), a helper to verify limits in vm_mmap*().
rS365485: Convert allocations of the phys pager to vm_pager_allocate().
rS365484: Add interruptible variant of vm_wait(9), vm_wait_intr(9).

Summary

WIP

Test program, also the API illustration: https://reviews.freebsd.org/P384

Diff Detail

Repository

rS FreeBSD src repository - subversion

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Harbormaster completed remote builds in B32981: Diff 75805.Aug 13 2020, 10:42 PM

Could you upload with context?

sys/amd64/amd64/pmap.c
6070 ↗	(On Diff #75222)	I am confused, earlier you wrote: I just kept the wiring code as is, it bumps the entry user wiring already, and the call to vm_fault() populates the page tables. On unwiring, we need to skip pmap_unwire() and vm_object_unwire(). and vm_map_entry_unwire() still skips pmap_unwire() for large pages.
sys/vm/vm_map.c
3096 ↗	(On Diff #75805)	I don't see the point of assigning `rv` here, it is assigned again a few lines below.
3103 ↗	(On Diff #75805)	I think it should look up the entry containing `end - 1`. Otherwise the vm_map_clip_end() call might try to clip `lentry` to a length of 0.

markj added inline comments.Aug 13 2020, 11:23 PM

sys/vm/vm_map.c
3113 ↗	(On Diff #75805)	I think we need to prohibit VM_INHERIT_COPY and perhaps _ZERO for large pages. Currently vmspace_fork() copies the boundary eflags from the old entry. With _COPY inheritance the page fault handler may busy individual 4KB pages in the large page, but because we avoid busying all 4KB pages in a large page there is no synchronization between them.

markj added inline comments.Aug 14 2020, 10:48 PM

sys/kern/uipc_shm.c
1053 ↗	(On Diff #75805)	We set `shmfd->shm_flags = shmflags` after every shm_open2(), not just the one that created the object. So, if an object is created with SHM_LARGEPAGE, a subsequent shm_open2() of the object without SHM_LARGEPAGE will clobber the flag.

kevans added inline comments.Aug 14 2020, 11:23 PM

sys/kern/uipc_shm.c
1053 ↗	(On Diff #75805)	Ugh, this is my bug. =(

Switch back to checking shm_object type to detect largepage shm.
Disallow non-shared mappings.
Provide usermode API (shm_open_largepage).

Harbormaster completed remote builds in B33005: Diff 75838.Aug 14 2020, 11:46 PM

kib added inline comments.Aug 14 2020, 11:46 PM

sys/amd64/amd64/pmap.c
6070 ↗	(On Diff #75222)	This is still not done, I will integrate your tests with my patch, then I want to think about this some more.
sys/vm/vm_map.c
3113 ↗	(On Diff #75805)	Then MAP_PRIVATE should be disabled as well.

markj added inline comments.Aug 15 2020, 4:40 PM

sys/amd64/amd64/pmap.c
6070 ↗	(On Diff #75222)	I will convert the tests to use the new interface and push some new tests, but it might not be done before tomorrow.
sys/vm/vm_map.c
3113 ↗	(On Diff #75805)	Yes, I think so. There might be a useful semantic for MAP_PRIVATE. Suppose I want to reserve a pool of large pages at boot, before physical memory is fragmented. I can create a set of named shm objects for that purpose, but I can't see a way for applications to atomically "claim" a particular object. Some malicious application could open all of them and map them, so there is no way to ensure that a legitimate consumer gets a private set of large pages from the pool. It would be good to provide some interface that guarantees that a largepage object is not mapped by any other process. I haven't thought a lot about it so far so I don't want to make any suggestions yet. Maybe I am missing something.

markj added inline comments.Aug 16 2020, 8:26 PM

sys/vm/vm_map.c
3103 ↗	(On Diff #75805)	I meant that it should be `vm_map_lookup_entry(map, end - 1, &lentry)`.

markj added inline comments.Aug 16 2020, 8:35 PM

sys/vm/vm_map.c
3106 ↗	(On Diff #75838)	Why not allow VM_INHERIT_NONE?

Fix buildworld.
Minor fixes.
Wiring still not handled.

Harbormaster completed remote builds in B33020: Diff 75870.Aug 16 2020, 8:41 PM

Allow INHERIT_NONE and INHERIT_ZERO.
Make wiring functional.
CoW mappings are still disabled

Harbormaster completed remote builds in B33027: Diff 75892.Aug 17 2020, 3:39 PM

markj added inline comments.Aug 21 2020, 8:31 PM

sys/kern/uipc_shm.c
788 ↗	(On Diff #75892)	This function is called with the object locked, but here we're returning with the object unlocked.
sys/vm/vm_map.c
3106 ↗	(On Diff #75892)	I believe VM_INHERIT_ZERO should also be prohibited.

kib marked 2 inline comments as done.Aug 22 2020, 5:26 PM

kib added inline comments.

sys/vm/vm_map.c
3106 ↗	(On Diff #75892)	Can you explain why ? I clear MAP_ENTRY_SPLIT_BOUNDARY_MASK on fork for INHERIT_ZERO.

Relock on error.

Harbormaster completed remote builds in B33106: Diff 76089.Aug 22 2020, 5:36 PM

markj added inline comments.Aug 23 2020, 6:35 PM

lib/libc/sys/shm_open.c
74 ↗	(On Diff #76089)	So in order to open an existing largepage object with shm_open_largepage(), one has to reproduce the original psind, alloc_policy parameters (otherwise FIOSHMLPGCNF will fail), or open it with shm_open(). Does it suggest that shm_open_largepage() should really be shm_create_largepage(), i.e., O_CREAT is implicit, and shm_open() is always used to open existing largepage objects?
sys/vm/vm_map.c
3106 ↗	(On Diff #75892)	It is just kind of weird to me - we are mapping and using a shm object because its mappings have some special properties, and after fork() it is replaced by a plain anonymous mapping. I guess INHERIT_ZERO is permitted for vnodes as well, so there is some precedent for this. I'll just modify my test for this case.

kib marked 2 inline comments as done.Aug 23 2020, 9:12 PM

kib added inline comments.

sys/vm/vm_map.c
3106 ↗	(On Diff #75892)	But this is the semantic of VM_INHERIT_ZERO.

Merge with LA57 commit.
Rename shm_open_largepage() to shm_create_largepage().
Rename FIOSHMLPGCNF to FIOSSHMLPGCNF.
Add FIOGSHMLPGCNF to fetch shm largepage configuration.

Harbormaster completed remote builds in B33122: Diff 76120.Aug 23 2020, 9:15 PM

Use convenience macros to calculate pindexes, introduced with LA57 commit.

Harbormaster completed remote builds in B33124: Diff 76122.Aug 23 2020, 9:33 PM

markj added inline comments.Aug 23 2020, 9:38 PM

sys/vm/vm_map.c
3106 ↗	(On Diff #75892)	Right, and I think the semantic makes no sense for shared mappings. We already have INHERIT_NONE for them.

Modulo the new comments, I think the diff is reasonable. I would prefer to commit the tests separately. I have a WIP arm64 patch but have not tested yet and it should not block the initial commit. I'm also happy to write the man page modifications if you like. Aside from documentation of the shm interface we should document the new source of EINVAL for various memory system calls.

sys/kern/uipc_shm.c
792 ↗	(On Diff #76122)	Need to relock here as well.
sys/vm/vm_mmap.c
426 ↗	(On Diff #76122)	The check does not work properly in some cases because shm_flags gets clobbered on every open() as I pointed out before. So either SHM_LARGEPAGE needs to be preserved in kern_shm_open2() or some other method of identifying largepage objects is needed.
usr.bin/posixshmcontrol/posixshmcontrol.c
83 ↗	(On Diff #76122)	This is still using the old ioctl. I guess it can use shm_create_largepage() now.

This revision is now accepted and ready to land.Aug 30 2020, 5:52 PM

I do not intend to commit tests, this is your code.

sys/kern/uipc_shm.c
798 ↗	(On Diff #76122)	This should be VM_OBJECT_WLOCK().

Handle notes.

This revision now requires review to proceed.Aug 30 2020, 7:35 PM

Harbormaster completed remote builds in B33243: Diff 76381.Aug 30 2020, 7:35 PM

kevans mentioned this in D26242: posixshm: fix setting of shm_flags.Aug 31 2020, 3:27 AM

kevans mentioned this in rS364990: posixshm: fix setting of shm_flags.Aug 31 2020, 3:07 PM

Fix mis-merge in amd64 pmap_mincore().
Switch posixshmcontrol(1) to libc API.

Tested by: pho

Harbormaster completed remote builds in B33442: Diff 76758.Sep 8 2020, 10:35 AM

This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2020, 11:28 PM

Closed by commit rS365484: Add interruptible variant of vm_wait(9), vm_wait_intr(9). (authored by kib). · Explain Why

This revision was automatically updated to reflect the committed changes.

kib added a commit: rS365484: Add interruptible variant of vm_wait(9), vm_wait_intr(9)..

kib added a commit: rS365485: Convert allocations of the phys pager to vm_pager_allocate()..Sep 8 2020, 11:39 PM

kib added a commit: rS365486: Add kern_mmap_racct_check(), a helper to verify limits in vm_mmap*()..Sep 8 2020, 11:48 PM

kib added a commit: rS365488: Allow consumer to customize physical pager..Sep 9 2020, 12:01 AM

kib reopened this revision.Sep 9 2020, 3:36 PM

Rest of the patch, with the following fixes:

handle PDP superpages in pmap_extract_and_hold(). actually restructure the function to make if() nesting bearable.
Only allow SHM_LARGEPAGE on amd64, no other pmaps were handled.
Fix tests compilation on mips, where MAXPAGESIZES == 1.

Harbormaster completed remote builds in B33461: Diff 76815.Sep 9 2020, 3:38 PM

markj added inline comments.Sep 9 2020, 3:56 PM

sys/amd64/amd64/pmap.c

3616 ↗

(On Diff #76815)

This could be restructured a bit to avoid testing m != NULL after the check_page label:

if ((pdpe & PG_PS) != 0) {
    if ((pdpe & PG_RW) == 0 && (prot & VM_PROT_WRITE) != 0)
        goto out;
    m = PHYS_TO_VM_PAGE(...);
    goto check_page;
}
...
if ((pte & PG_RW) == 0 && (prot & VM_PROT_WRITE) != 0)
    goto out;
m = PHYS_TO_VM_PAGE(pte & PG_FRAME);
check_page:
if (!vm_page_wire_mapped(m))
    m = NULL;
out:
...

kib marked an inline comment as done.Sep 9 2020, 4:19 PM

kib added inline comments.

sys/amd64/amd64/pmap.c
3616 ↗	(On Diff #76815)	I restructured as you suggested, but I do not believe that m!=NULL check can be dropped. PHYS_TO_VM_PAGE() can return NULL, for instance imagine that userspace mapped a PCI BAR outside DMAP.