One fewer syscall during binary startup. Besides, reading in a small text file using mmap seems just weird; we use the usual read(2) for ld-elf.so.hints.

In D12767#264932, @trasz wrote:

One fewer syscall during binary startup. Besides, reading in a small text file using mmap seems just weird; we use the usual read(2) for ld-elf.so.hints.

Well, you still have to get the memory to read into from somewhere, and this memory is of course obtained by mmap. Also, since we do not modify the libmap.conf content, you make two or more copies of the content.

Might be ld-elf.so.hints should be mapped as well.

I would understand if you argued that read(2) is faster than mmap(2), but I doubt that it is possible to measure in this situation. Perhaps buildworld with non-empty libmap.conf and dynamically-linked toolchain could see some difference.

The allocated memory probably indeed seems to be allocated with mmap(2), but that's done anyway, ie we're not adding another mmap call to the binary startup. The read(2) might indeed be a bit faster due to not messing with memory mappings, but as you say, I don't expect this to be measurable. The additional copy won't hurt either - we're touching all that data just afterwards.

Still, it's one syscall less.

In D12767#264951, @trasz wrote:

The allocated memory probably indeed seems to be allocated with mmap(2), but that's done anyway, ie we're not adding another mmap call to the binary startup. The read(2) might indeed be a bit faster due to not messing with memory mappings, but as you say, I don't expect this to be measurable. The additional copy won't hurt either - we're touching all that data just afterwards.

Additional copy means +1 dirty page per process. So it is not completely negligible.

Still, it's one syscall less.

Still, I am not convinced. I will not stop you, and promise to not scream if you commit this.

BTW, I have somewhere less trivial patches that completely remove the infinite stream of sigprocmask(2) syscalls from the single-threaded locking in rtld. The patch makes it work by allowing the thread to specify a location on stack where the current mask is read by kernel when needed (AFAIR). If you are interested, I might try to find and rebase them. Even for trivial do-nothing binary, it shaves several dozens of syscalls.

I've just did a totally unscientific benchmark (for n in jot 10`; do /usr/bin/time sh -c 'for f in jot 10000; do /usr/bin/true; done'; done 2>&1`), curated the results with sed 's/,/./g', and... huh.

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                   +                                                                                                                                                          |
|                                                   +                                                                                                      x                                                   |
|                                                   +                                                                                                      x                                                   |
|                                                   +                                                                                                      x                                                   |
|                                                   +                                                                                                      x                                                   |
|+                                                  +                                                   x                                                  x                                                  x|
|+                                                  +                                                   *                                                  x                                                  x|
|                 |____________________________A____M_______________________|                                            |_________________________________A_________________________________|                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10          4.38           4.4          4.39          4.39  0.0066666667
+  10          4.36          4.38          4.37         4.369  0.0056764621
Difference at 95.0% confidence
        -0.021 +/- 0.00581741
        -0.47836% +/- 0.132148%
        (Student's t, pooled s = 0.00619139)

Why the additional one dirty page? We're not allocating another page just for the read buffer, from what I undestand it's carved from the existing heap, which already contains writable variables.

Regarding the signal masks - I'm definitely interested!

Closed by commit rS324949: Use xmalloc and read(2) instead of mmap(2) to read in libmap.conf(5). (authored by trasz). · Explain WhyOct 24 2017, 10:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

Path

Size

head/

libexec/

rtld-elf/

libmap.c

10 lines

Diff 34276

View Options

head/libexec/rtld-elf/libmap.c

Use xmalloc+read instead of mmap(2) to read in libmap.conf(5)ClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 34276

head/libexec/rtld-elf/libmap.c

Use xmalloc+read instead of mmap(2) to read in libmap.conf(5)
ClosedPublic
Actions

Revision Contents
Changeset List