Page MenuHomeFreeBSD

Only insert objects into a shadow list if they can later be collapsed.
ClosedPublic

Authored by jeff on Nov 17 2019, 1:22 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 14, 11:54 PM
Unknown Object (File)
Tue, Nov 26, 2:36 AM
Unknown Object (File)
Oct 31 2024, 4:41 PM
Unknown Object (File)
Oct 2 2024, 4:48 PM
Unknown Object (File)
Sep 30 2024, 10:39 AM
Unknown Object (File)
Sep 18 2024, 2:45 AM
Unknown Object (File)
Sep 9 2024, 2:38 AM
Unknown Object (File)
Sep 7 2024, 4:13 PM
Subscribers

Details

Summary

The object shadow list is maintained so that we can collapse anonymous objects. This is not possible when your backing object is a vnode, for example. In this case there is no sense in maintaining the shadow_list. I created a handful of convenience functions to simplify the code that manages these lists as well.

If you think about a compiler that is mapped private for the data section you can see that we will construct huge chains of shadow lists that every compiler instance contends on. We need only the backing_object pointer to be correct here so that we can traverse the chain for page faults.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jeff added reviewers: markj, kib, alc, dougm.
sys/vm/vm_object.c
1340 ↗(On Diff #64476)

Isn't this assignment redundant?

1404 ↗(On Diff #64476)

Can't you use vm_object_backing_insert(new_object, source) here?

This should be fine IMO.

The only purpose of the shadow_list seems to be able to call vm_object_collapse() of refcount == 1 in vm_object_deallocate(). It might be that doing more collapses in other places, or to follow the whole shadow chain on collapse, makes this location not important. Might be we can remove shadow_list altogether then.

This revision is now accepted and ready to land.Nov 19 2019, 3:28 PM
In D22423#490916, @kib wrote:

This should be fine IMO.

The only purpose of the shadow_list seems to be able to call vm_object_collapse() of refcount == 1 in vm_object_deallocate(). It might be that doing more collapses in other places, or to follow the whole shadow chain on collapse, makes this location not important. Might be we can remove shadow_list altogether then.

I spent some time looking at that rather than pursuing this. There are various places that you could drive collapse scans from the top level objects but none of them will catch as many potential cases as this. Failing to collapse can lead to memory that is held but unreachable because a shadow covers only part of its backing object or the shadow completely covers the backing object but duplicates its pages. I suspect that even this system can be defeated by carefully structured mmap and fork calls. I know dillon just made some changes in dragonfly to restructure these sharing relationships and got rid of shadow chains but I think his new system has far more holes than this.

Maybe Alan has some thoughts. I feel like this set of patches is a welcome bandaid but I would've preferred to have done something more architectural to solve it better in the long run.

Kind of obvious idea is to switch to periodic scans for collapses, and make the scan to walk the whole shadow chain for each map entry that has anon object on top. I.e. instead of trying to do collapses at precise points, we would have yet another daemon.

That would have the advantage of doing less work for transient conditions. Rapidly forking/exiting processes. It would be hard to do the scan efficiently. But if you did a sort of ranged mark/sweep you would be able to plug all holes from a central algorithm and simplify normal operation.

In D22423#490985, @jeff wrote:

...

Maybe Alan has some thoughts. I feel like this set of patches is a welcome bandaid but I would've preferred to have done something more architectural to solve it better in the long run.

I think that we should explore the implementation of finer-grained reference counting on anonymous memory objects, specifically, associating a reference count with a range of page indices. This would subsume OBJ_ONEMAPPING, allow for a more effective vm_object_coalesce(), and eliminate the need for the horrible hack that is vm_object_split(). I think that this might actually be practical because a lot of operations that currently change the object's ref count, e.g., clipping and merging, would not need to change the finer-grained reference count because we are neither mapping nor unmapping the underlying page indices.

We would have to come up with an efficient representation for the page granular references. Not just to store but to search. Now that we're not keeping shadow lists on vnode objects you would hope that there wouldn't be a huge number of references but it may be possible with an application that uses fork for concurrency. Like postgres maybe? I haven't looked at the consumers enough to say what the normal bounds are.

Referencing this way would also almost serve as a PV replacement if the top level shadow pointed back to the pmap. I saw that Dillon recently replaced per-page pv with per-object. This of course makes searching more expensive if you have a lot of sparse maps. I'm not sure whether it's the right thing or not but if you could kill two birds with one stone..