Page MenuHomeFreeBSD

superpages, rtld, and the 1-page mmap
ClosedPublic

Authored by alc on Mar 5 2015, 7:05 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 15, 3:53 AM
Unknown Object (File)
Fri, Dec 6, 6:52 PM
Unknown Object (File)
Fri, Nov 29, 8:03 PM
Unknown Object (File)
Nov 26 2024, 7:22 AM
Unknown Object (File)
Oct 19 2024, 1:45 PM
Unknown Object (File)
Oct 11 2024, 6:09 PM
Unknown Object (File)
Oct 11 2024, 6:09 PM
Unknown Object (File)
Oct 11 2024, 6:09 PM
Subscribers

Details

Summary

At one point last year, we were discussing rtld and the non-use of superpage mappings for the code sections of large libraries. What we found was that the way that rtld was loading the library effectively blocked the use of superpage mappings. Specifically, the non-use of superpage mappings was the result of two things coming together. First, rtld performs a 1-page mmap call to read the header and this leads to an arbitrary color setting on the vm object backing the shared library. Second, rtld reserves virtual address space for the shared library using an anonymous mmap, so it has no idea what color was assigned to the vm object. (Currently, we force the anonymous mapping to be superpage aligned if it is sufficiently large.) Unfortunately, the arbitrary color assignment to the object results in the allocation of physical memory that is not aligned with the selected virtual address range, so superpage mappings are impossible.

We discussed switching from the 1-page mmap call to a read call, but that has a problem. We only set a color on mapped vm objects. By the time that we mapped the shared library and set its color, the read call would have resulted in the allocation of a few individual pages at the start of the vm object, blocking the creation of a reservation and eventual superpage mapping on the first possible superpage of the code section. (It's worth keeping in mind here that libc on arm is one and a fraction superpages, so we would lose the ability to create that one possible superpage mapping.)

Alternatively, we could force the 1-page mmap call to be superpage aligned so that the color is set correctly. However, I wanted to see if there was a way to handle this case automatically.

To that end, I propose the attached patch. In essence, the key idea here is that named vm objects will always have their color set to zero before they are mapped. In general, I think that this makes sense from the standpoint of maximizing the creation of superpage mappings, not just as a solution for the rtld problem. In the case of rtld, the 1-page mmap will still occur at an arbitrary virtual address, but the allocated upon reference physical memory will nonetheless be superpage aligned and backed by a reservation.

With the recent change to the page clustering policy on read faults and this change, I now see some superpage mappings created for the code section of the giant shared library used by the JVM:

1281 0x801c00000 0x80269e000 r-x 2185 2375 2 1 CNS- vn /usr/local/openjdk7/jre/lib/amd64/server/libjvm.so

At the same time, I'm not seeing a huge increase in the number of reservations created and destroyed. So, I don't think that this change is too aggressive.

Test Plan

Test on armv6 to verify that a superpage mapping is being created for the first 1 MB of libc.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

alc retitled this revision from to superpages, rtld, and the 1-page mmap.
alc updated this object.
alc edited the test plan for this revision. (Show Details)
alc added reviewers: kib, jhb.
vm/vm_mmap.c
1394 ↗(On Diff #4119)

Why is this check for NULL needed ?

vm_object_allocate() calls uma_zalloc(M_WAITOK). Do you plan to change this ?

vm/vm_object.h
269 ↗(On Diff #4119)

But from this description, doesn't it make less code to set pg_color for named objects unconditionally upon creation ? We already know the OBJT_VNODE there, and for anon objects, OBJ_NOSPLIT is good indicator.

Also, we might do not bother constraining page allocations for objects smaller than the superpage.

I'm not opposed to the idea of forcing all files to a color of zero. It would be good to someday fix the issue that a read() of a file before mmap() chooses non-superpages even with this in place (at least I think it did in my testing previously)

vm/vm_mmap.c
1394 ↗(On Diff #4119)

Note that vm_pager_allocate is being called here, not vm_object_allocate.

vm_pager_allocate can return NULL if there isn't a valid method dispatch table for the specified object type. Also, some method implementations can return NULL, e.g., swap_pager_alloc. Removing the NULL check here would have to be based on the caller having intimate knowledge about the implementation of vnode_pager_alloc. Do we want to do that? I don't feel strongly one way or the other about this.

vm/vm_object.h
269 ↗(On Diff #4119)

I've been reluctant to unconditionally enable reservations for files because it leads to a big increase in partially populated reservations. Ultimately, I think it's the right thing to do. However, before we do it, I also think the policy for breaking reservations needs to become more aggressive so that fragmentation doesn't become even more of a problem. Right now, we will wait until allocation from the buddy allocator fails before breaking a reservation. Alternatively, suppose that we had a time stamp that represented either the last activity on a partially populated reservation or in the case of the buddy allocator when a contiguous 2+ MB chunk of memory landed or coalesced in the buddy queues. Then, if I needed to allocate a single page, but couldn't do it without resorting to breaking up a 2+ MB chunk from the buddy queues, I would look at the time stamps. If the reservation is older than the next 2+ MB chunk from the buddy queues, I would break the reservation.

I'm not sure that I understood the second comment, i.e., "Also, ..." I think that the correct response is that I should add a sentence to the comment in the code saying that setting the color on a small object is essentially a NOP. That it doesn't constrain page allocation in any way.

kib edited edge metadata.
kib added inline comments.
vm/vm_object.h
269 ↗(On Diff #4119)

Could you, please, explain how the second part of your answer is realized in the code ? I.e. why the small colored object does not allow creation of a reservation ?

I looked at the start test 'Is a reservation fundamentally impossible?' in the vm_reserv_alloc_page(), but it seems that the test would just pass for color == 0.

This revision is now accepted and ready to land.Mar 7 2015, 10:57 AM

I have tested the patch on pandaboard (armv6) for "make -j4 buildkernel".
I have got these results:
vm.pmap.section.mappings: 0 - not patched kernel
vm.pmap.section.mappings: 1517 - patched kernel

In D2013#10, @onwahe-gmail-com wrote:

I have tested the patch on pandaboard (armv6) for "make -j4 buildkernel".
I have got these results:
vm.pmap.section.mappings: 0 - not patched kernel
vm.pmap.section.mappings: 1517 - patched kernel

Thanks. I'd really like to see the output of "procstat -av".

Here is the output of "procstat -av" for current pmap-v6. The only mapping of libc marked with S flag is the one of last process - procstat itself. However, it's not marked always. When I type the command immediately after login, it's not marked.

Here is the output of "procstat -av" for new pmap-v6. This output looks like more expected result, IMHO.

I can't explain the difference between the new pmap and the old one, but I can explain why you see a mix of regular and superpage mappings for libc's code.

Consider the first program to run using libc. None of libc's code will be resident. The libc functions used by that program will be loaded into memory upon invocation. However, even with page clustering on faults, that program may only use a fraction of libc. Maybe the second, third, and fourth programs to run use the same set of functions. However, the fifth program uses some previously unused functions, loading additional parts of libc's code. Ultimately, by the time that you run "procstat -av" everything within the first 1 MB of code has been loaded and "procstat -av" uses a superpage mapping for that 1 MB. So, should all future programs to run.

Again, I don't know why there was any difference between the new and old pmap.

Yep, I understand that. Considering difference between old and new pmap-v6, I debugged it this morning. It turned out that many section mappings were created already when I'm typing "procstat -av", but only few ones for currently running processes. Thus I think that the difference is mainly caused by not implemented pmap_copy() in old pmap. I.e. physical mappings are not copied during fork(). And most processes in procstat output are daemons.

Next thing is that there is section demotion done for those processes where section mapping was created. It is done in context of pagedaemon. It's question I did not seek for an answer if this is done only when old pmap is in game.

I wouldn't expect the page daemon to have an effect this soon after booting. Moreover, pmap_ts_referenced() no longer demotes unconditionally.

Anyway, I'm satisfied that the machine-independent code is behaving as it should on arm.

P.S. I recall an attempt to implement pmap_copy() for the current pmap-v6.c, but I believe that it got reverted because of inexplicable crashes on Raspberry PI.

P.P.S. Is it feasible to move the starting address of the text section to 1 MB? Or does the ABI actually require 32 KB as the starting address? Moving the starting address would get you a bunch more superpage mappings on the text section of large executables, e.g., clang or gcc.

alc edited edge metadata.

I've revised the comment describing vm_object_color() to address one of Kostik's questions. I'm planning to commit this revision in a few hours. John, you haven't clicked "accept" so I'll list you as "Discussed with:" rather than "Reviewed by:".

This revision now requires review to proceed.Mar 20 2015, 4:55 PM
jhb edited edge metadata.
This revision is now accepted and ready to land.Mar 21 2015, 5:24 PM
alc updated this revision to Diff 4322.

Closed by commit rS280327 (authored by @alc).