Page MenuHomeFreeBSD

Use xmalloc+read instead of mmap(2) to read in libmap.conf(5)
ClosedPublic

Authored by trasz on Oct 23 2017, 12:52 PM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 10 2024, 6:17 AM
Unknown Object (File)
Dec 26 2023, 10:36 PM
Unknown Object (File)
Dec 22 2023, 11:28 PM
Unknown Object (File)
Dec 13 2023, 3:56 AM
Unknown Object (File)
Nov 29 2023, 11:51 PM
Unknown Object (File)
Nov 4 2023, 5:34 AM
Unknown Object (File)
Sep 10 2023, 8:25 PM
Unknown Object (File)
Aug 3 2023, 9:20 PM
Subscribers

Details

Summary

Use xmalloc+read instead of mmap(2) to read in libmap.conf(5).
This removes the need to call munmap(2) afterwards.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Why is this useful ? You did not bothered to explain.

One fewer syscall during binary startup. Besides, reading in a small text file using mmap seems just weird; we use the usual read(2) for ld-elf.so.hints.

One fewer syscall during binary startup. Besides, reading in a small text file using mmap seems just weird; we use the usual read(2) for ld-elf.so.hints.

Well, you still have to get the memory to read into from somewhere, and this memory is of course obtained by mmap. Also, since we do not modify the libmap.conf content, you make two or more copies of the content.

Might be ld-elf.so.hints should be mapped as well.

I would understand if you argued that read(2) is faster than mmap(2), but I doubt that it is possible to measure in this situation. Perhaps buildworld with non-empty libmap.conf and dynamically-linked toolchain could see some difference.

The allocated memory probably indeed seems to be allocated with mmap(2), but that's done anyway, ie we're not adding another mmap call to the binary startup. The read(2) might indeed be a bit faster due to not messing with memory mappings, but as you say, I don't expect this to be measurable. The additional copy won't hurt either - we're touching all that data just afterwards.

Still, it's one syscall less.

The allocated memory probably indeed seems to be allocated with mmap(2), but that's done anyway, ie we're not adding another mmap call to the binary startup. The read(2) might indeed be a bit faster due to not messing with memory mappings, but as you say, I don't expect this to be measurable. The additional copy won't hurt either - we're touching all that data just afterwards.

Additional copy means +1 dirty page per process. So it is not completely negligible.

Still, it's one syscall less.

Still, I am not convinced. I will not stop you, and promise to not scream if you commit this.

BTW, I have somewhere less trivial patches that completely remove the infinite stream of sigprocmask(2) syscalls from the single-threaded locking in rtld. The patch makes it work by allowing the thread to specify a location on stack where the current mask is read by kernel when needed (AFAIR). If you are interested, I might try to find and rebase them. Even for trivial do-nothing binary, it shaves several dozens of syscalls.

I've just did a totally unscientific benchmark (for n in jot 10`; do /usr/bin/time sh -c 'for f in jot 10000; do /usr/bin/true; done'; done 2>&1`), curated the results with sed 's/,/./g', and... huh.

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                   +                                                                                                                                                          |
|                                                   +                                                                                                      x                                                   |
|                                                   +                                                                                                      x                                                   |
|                                                   +                                                                                                      x                                                   |
|                                                   +                                                                                                      x                                                   |
|+                                                  +                                                   x                                                  x                                                  x|
|+                                                  +                                                   *                                                  x                                                  x|
|                 |____________________________A____M_______________________|                                            |_________________________________A_________________________________|                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10          4.38           4.4          4.39          4.39  0.0066666667
+  10          4.36          4.38          4.37         4.369  0.0056764621
Difference at 95.0% confidence
        -0.021 +/- 0.00581741
        -0.47836% +/- 0.132148%
        (Student's t, pooled s = 0.00619139)

Why the additional one dirty page? We're not allocating another page just for the read buffer, from what I undestand it's carved from the existing heap, which already contains writable variables.

Regarding the signal masks - I'm definitely interested!

This revision was automatically updated to reflect the committed changes.