Page MenuHomeFreeBSD

vm pager: writemapping accounting for OBJT_SWAP
ClosedPublic

Authored by kevans on Aug 28 2019, 4:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 8, 10:20 PM
Unknown Object (File)
Sat, Nov 2, 11:29 PM
Unknown Object (File)
Fri, Nov 1, 11:43 PM
Unknown Object (File)
Fri, Nov 1, 11:43 PM
Unknown Object (File)
Fri, Nov 1, 11:43 PM
Unknown Object (File)
Fri, Nov 1, 11:43 PM
Unknown Object (File)
Fri, Nov 1, 11:42 PM
Unknown Object (File)
Fri, Nov 1, 11:42 PM
Subscribers

Details

Summary

Currently writemapping accounting is only done for vnode_pager which does some accounting on the underlying vnode.

Extend this to allow accounting to be possible for any of the pager types. New pageops are added to update/release writecount that need to be implemented for any pager wishing to do said accounting, and we implement these methods now for both vnode_pager (unchanged) and swap_pager.

The primary motivation for this is to allow other systems with OBJT_SWAP objects to check if their objects have any write mappings and reject operations with EBUSY if so. posixshm will be the first to do so in order to reject adding write seals to the shmfd if any writable mappings exist.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/kern/uipc_shm.c
935 ↗(On Diff #61406)

This shouldn't happen for private mappings.

sys/kern/uipc_shm.c
935 ↗(On Diff #61406)

This updates the writecount for read-only mappings. Is that intentional?

sys/vm/vm_map.h
147 ↗(On Diff #61406)

The comment is still specific to vnodes.

kevans marked 2 inline comments as done.
  • shm: only do accounting on writable shared mappings
  • adjust comment
sys/vm/vm_object.h
171 ↗(On Diff #61411)

This increases each vm_object by 8 bytes. Objects are typically allocated for almost any cached vnodes, and modern systems cache millions of vnodes.

With some efforts, you can put writemappings under swp structure, I believe. Then it should not increase the vm_object' size.

Did you coded it that way because shmfd starts as OBJT_DEFAULT ? You can safely change that to OBJT_SWAP.

sys/vm/vm_pager.h
207 ↗(On Diff #61411)

I dislike this. Why not provide the 'else' part as method ?

I suggest evaluating use of special pagertab for swap objects with counted writeable mappings.

Let's give this a shot:

Move writemappings back into vnp and add another one for swp; size of vm_object should remain the same. shm is now OBJT_SWAP by default so it can take advantage of writemapping.

Approach is still mostly the same- generalize the flags for writemapping, add pageops to do the accounting. Without the pageops, {update,release}_writecount are nop. Non-OBJT_VNODE and OBJT_SWAP types can specify writecounting and nothing useful happens.

I'll rewrite title/summary when we come to a consensus.

Overall this looks good.

I would want to extend/adopt this to writecounting writeable mappings of tmpfs vnodes, which use swap objects for v_object as well. You could look at it now, or postpone for a followup.

sys/kern/uipc_shm.c
553 ↗(On Diff #61433)

I suggest to commit this change on its own, when it is time to commit all the stuff.

902 ↗(On Diff #61433)

Remove this.

912 ↗(On Diff #61433)

writecnt = (flags & MAP_SHARED) != 0 && (prot & VM_PROT_WRITE) != 0;

sys/vm/swap_pager.c
3004 ↗(On Diff #61433)

In both functions, assert that OBJ_NOSPLIT flag is set.

In D21456#467490, @kib wrote:

Overall this looks good.

I would want to extend/adopt this to writecounting writeable mappings of tmpfs vnodes, which use swap objects for v_object as well. You could look at it now, or postpone for a followup.

Hmm... I think this just kind of inherently works out for the most part, unless I'm looking at a different path than you're meaning:

vn_mmap calls vm_mmap_vnode; obj->type != OBJT_VNODE and writex = true. Previously it would have called vnode_pager_update_writecount, which would be a NOP because obj->type != OBJT_VNODE, but now it delegates to the swap_pager's update_writecount implementation -- so before we thought we were writecounting but we weren't, and it didn't really matter because that only invokes methods that didn't do anything with the OBJT_SWAP they were getting passed. Now we think we're writecounting and we are, because we're looking at OBJT_SWAP.

So for tmpfs, we really just need to use the v_object's writemappings now, no?

In D21456#467490, @kib wrote:

Overall this looks good.

I would want to extend/adopt this to writecounting writeable mappings of tmpfs vnodes, which use swap objects for v_object as well. You could look at it now, or postpone for a followup.

Hmm... I think this just kind of inherently works out for the most part, unless I'm looking at a different path than you're meaning:

vn_mmap calls vm_mmap_vnode; obj->type != OBJT_VNODE and writex = true. Previously it would have called vnode_pager_update_writecount, which would be a NOP because obj->type != OBJT_VNODE, but now it delegates to the swap_pager's update_writecount implementation -- so before we thought we were writecounting but we weren't, and it didn't really matter because that only invokes methods that didn't do anything with the OBJT_SWAP they were getting passed. Now we think we're writecounting and we are, because we're looking at OBJT_SWAP.

So for tmpfs, we really just need to use the v_object's writemappings now, no?

I do not know, great if it just works. Take a look at tmpfs_vfsops.c:tmpfs_rw_to_ro().

kevans marked 3 inline comments as done.
kevans retitled this revision from vm pager: generalize writemapping accounting to vm pager: writemapping accounting for OBJT_SWAP.
kevans edited the summary of this revision. (Show Details)

Address remaining comments from @kib, some light wordsmithing on the title/summary.

In D21456#467501, @kib wrote:
In D21456#467490, @kib wrote:

Overall this looks good.

I would want to extend/adopt this to writecounting writeable mappings of tmpfs vnodes, which use swap objects for v_object as well. You could look at it now, or postpone for a followup.

Hmm... I think this just kind of inherently works out for the most part, unless I'm looking at a different path than you're meaning:

vn_mmap calls vm_mmap_vnode; obj->type != OBJT_VNODE and writex = true. Previously it would have called vnode_pager_update_writecount, which would be a NOP because obj->type != OBJT_VNODE, but now it delegates to the swap_pager's update_writecount implementation -- so before we thought we were writecounting but we weren't, and it didn't really matter because that only invokes methods that didn't do anything with the OBJT_SWAP they were getting passed. Now we think we're writecounting and we are, because we're looking at OBJT_SWAP.

So for tmpfs, we really just need to use the v_object's writemappings now, no?

I do not know, great if it just works. Take a look at tmpfs_vfsops.c:tmpfs_rw_to_ro().

I'll defer this for a little later after wrapping up some of the other bits I'm working on

sys/kern/uipc_shm.c
553 ↗(On Diff #61433)

Alright, will do three (3) commits total: DEFAULT -> SWAP, vm/* changes, then shm_mmap changes

sys/kern/uipc_shm.c
937 ↗(On Diff #61487)

shm_map(2) can also be used to map the object - don't we need to update the writecnt there too?

sys/vm/vm_map.h
147 ↗(On Diff #61406)

Maybe, "tracked writeable mapping"? Not all writeable mappings will have this flag set.

sys/kern/uipc_shm.c
937 ↗(On Diff #61487)

This is unclear to me- I was planning to check object's writemappings == 0 and shmfd->shm_kmappings == 0 since all kernel mappings are writable and we can avoid the extra accounting overhead since we already track that, so it's at least not necessarily important for what I need to observe at least.

markj added inline comments.
sys/kern/uipc_shm.c
937 ↗(On Diff #61487)

I see, I misunderstood what shm_map() is for.

This revision is now accepted and ready to land.Aug 30 2019, 9:28 PM
kib added inline comments.
sys/kern/uipc_shm.c
937 ↗(On Diff #61487)

My opinion is that we can certainly ignore kernel mappings for the purpose of memfd sealing.

kevans added inline comments.
sys/vm/vm_map.h
147 ↗(On Diff #61406)

Sounds good to me- I've changed this locally to "tracked writeable mapping" -- wrapping remains wrapped as it pre-diff.

kevans marked an inline comment as done.

reupload/reopen... I was trying to avoid this getting closed by converting uipc_shm to OBJT_SWAP in case @alc had any comments/objections. I failed to mask it properly, apparently.

This revision now requires review to proceed.Sep 1 2019, 12:40 AM
This revision was not accepted when it landed; it landed in state Needs Review.Sep 3 2019, 8:32 PM
This revision was automatically updated to reflect the committed changes.