Page MenuHomeFreeBSD

rtld-elf: Unmap unused object segments
AbandonedPublic

Authored by cse_cem_gmail_com on Dec 3 2014, 7:55 PM.

Details

Reviewers
kib
emaste
Summary

A typical library has a text (r-x code segment) and a data (rw-) mapping. Newer binutils align segments at 2MB boundaries. rtld-elf "reserves" space for the entire object by mapping all segments, including alignment space, as PROT_NONE, before proceeding to map specific segments with appropriate protection.

This patch unmaps the unused/PROT_NONE alignment areas of VM between valid segments of libraries, which reduces userspace VM bloat by up to 2MB per typical library.

Sponsored by: EMC / Isilon storage division

Test Plan

As applied to an internal Isilon binary that links ~68 dynamic libraries:

Before:

procstat -v pgrep isi_hdfs_d | wc -l

228
After:
160

From visual inspection, there are fewer PROT_NONE segments.

And ps reports:
Before:
vCemBSD-2# ps auxww|grep hdfs
root 1941 0.0 0.5 165296 14544 - Ss 4:07PM 0:08.56 /usr/bin/isi_hdfs_d
After:
vCemBSD-2# ps auxww|grep hdfs
root 27885 1.0 0.5 26188 15360 - Ss 9:12AM 0:00.01 /usr/bin/isi_hdfs_d

^^^^^^

(VSZ column goes from 165296 -> 26188.)

Diff Detail

Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

cse_cem_gmail_com retitled this revision from to rtld-elf: Unmap unused object segments.
cse_cem_gmail_com updated this object.
cse_cem_gmail_com edited the test plan for this revision. (Show Details)
cse_cem_gmail_com added reviewers: emaste, kibab.
cse_cem_gmail_com added a subscriber: bdrewery.

Assuming this is safe I really like the results.

This behaviour is intentional. Read explanation in r190885. In short, mappable holes between segments tends to be reused, which causes weird issues.

I have no idea what is 'userspace VM bloat'. VSZ does not consume any resources.

In short, mappable holes between segments tends to be reused, which causes weird issues.

Weird issues are bugs and can be addressed. From the revision:

Note that we cannot simply keep the holes between segments, because
other mappings may be made there. Among other issues, when the dso is
unloaded, rtld unmaps the whole region, deleting unrelated mappings.

Sounds like something that could be fixed quite easily in rtld.

I have no idea what is 'userspace VM bloat'. VSZ does not consume any resources.

That isn't quite true. Clearly there is some bookkeeping going on to track these PROT_NONE mappings. If it really is so cheap you think it is free, maybe PROT_NONE mappings shouldn't count towards vmem ulimits?

In short, mappable holes between segments tends to be reused, which causes weird issues.

Weird issues are bugs and can be addressed. From the revision:

Can you enumerate the issues ?

Note that we cannot simply keep the holes between segments, because
other mappings may be made there. Among other issues, when the dso is
unloaded, rtld unmaps the whole region, deleting unrelated mappings.

Sounds like something that could be fixed quite easily in rtld.

How rtld can fix possibility of other mmap(2) calls inserting something between dso segments ?
The example with the behaviour of unload_object() was just example. As another example, rtld tracks ownership of addresses per dso by inspecting the ranges, just look at the implementation of RTLD_SELF.

I have no idea what is 'userspace VM bloat'. VSZ does not consume any resources.

That isn't quite true. Clearly there is some bookkeeping going on to track these PROT_NONE mappings. If it really is so cheap you think it is free, maybe PROT_NONE mappings shouldn't count towards vmem ulimits?

PROT_NONE mapping entry consumes one struct vm_map_entry, which sizeof on my amd64 debugging kernel is 128 bytes.

In D1263#8, @kostikbel wrote:

Can you enumerate the issues ?

I haven't experienced any issues. You suggested there might be issues.

Note that we cannot simply keep the holes between segments, because
other mappings may be made there. Among other issues, when the dso is
unloaded, rtld unmaps the whole region, deleting unrelated mappings.

Sounds like something that could be fixed quite easily in rtld.

How rtld can fix possibility of other mmap(2) calls inserting something between dso segments ?

That isn't actually a problem... it only becomes a problem if rtld tries to unmap unrelated stuff in between. Which can be fixed.

The example with the behaviour of unload_object() was just example. As another example, rtld tracks ownership of addresses per dso by inspecting the ranges, just look at the implementation of RTLD_SELF.

And? Why is that a problem?

I have no idea what is 'userspace VM bloat'. VSZ does not consume any resources.

That isn't quite true. Clearly there is some bookkeeping going on to track these PROT_NONE mappings. If it really is so cheap you think it is free, maybe PROT_NONE mappings shouldn't count towards vmem ulimits?

PROT_NONE mapping entry consumes one struct vm_map_entry, which sizeof on my amd64 debugging kernel is 128 bytes.

So, do you approve of discounting PROT_NONE entries toward ulimit -v and ps VSZ?

In D1263#8, @kostikbel wrote:

Can you enumerate the issues ?

I haven't experienced any issues. You suggested there might be issues.

I said there are issues and noted some. I do not have intent of enumerating all issues, since I do not see a need to make the change.

Note that we cannot simply keep the holes between segments, because
other mappings may be made there. Among other issues, when the dso is
unloaded, rtld unmaps the whole region, deleting unrelated mappings.

Sounds like something that could be fixed quite easily in rtld.

How rtld can fix possibility of other mmap(2) calls inserting something between dso segments ?

That isn't actually a problem... it only becomes a problem if rtld tries to unmap unrelated stuff in between. Which can be fixed.

The example with the behaviour of unload_object() was just example. As another example, rtld tracks ownership of addresses per dso by inspecting the ranges, just look at the implementation of RTLD_SELF.

And? Why is that a problem?

I have no idea what is 'userspace VM bloat'. VSZ does not consume any resources.

That isn't quite true. Clearly there is some bookkeeping going on to track these PROT_NONE mappings. If it really is so cheap you think it is free, maybe PROT_NONE mappings shouldn't count towards vmem ulimits?

PROT_NONE mapping entry consumes one struct vm_map_entry, which sizeof on my amd64 debugging kernel is 128 bytes.

So, do you approve of discounting PROT_NONE entries toward ulimit -v and ps VSZ?

No. I cannot see why this is needed, or why this would be not a bug.