Page MenuHomeFreeBSD

Add a sysctl to dump kernel mappings and their attributes.
ClosedPublic

Authored by markj on Aug 23 2019, 3:57 PM.
Tags
None
Referenced Files
F106126415: D21380.id61536.diff
Wed, Dec 25, 8:15 PM
F106113574: D21380.id61465.diff
Wed, Dec 25, 3:01 PM
Unknown Object (File)
Mon, Dec 23, 2:58 AM
Unknown Object (File)
Nov 23 2024, 10:49 AM
Unknown Object (File)
Nov 1 2024, 11:44 PM
Unknown Object (File)
Oct 1 2024, 1:45 PM
Unknown Object (File)
Sep 29 2024, 10:02 PM
Unknown Object (File)
Sep 29 2024, 5:59 PM
Subscribers

Details

Summary

This is useful when auditing for writeable, executable mappings. I
included several other attributes of possible interest: cache mode, U/S
mode (obviously we should never see user mappings in the kernel pmap),
global bit, and the number of 4KB, 2MB and 1GB pages in a contiguous
range.

The sysctl is readable by default. This would be unacceptable if we had
KASLR, but we don't. However, it also lets a random user lock the
kernel pmap for a non-trivial duration (20-30ms on one of my systems),
so I am considering making it root-only.

I did not include the page array as a separate "map": I think the use of
a dedicated PML4E for the page array is problematic and plan to move it
to the kernel map based on some discussion with Jeff. Among other
things, we currently don't include the page array in minidumps.

Once changes from the review settle, I will implement the sysctl for
other pmaps that I can test (i386, arm64, riscv).

Test Plan

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

markj added reviewers: alc, kib, dougm.
sys/amd64/amd64/pmap.c
10050 ↗(On Diff #61166)

Can we print the raw pat bits in hex instead of panicing ? I feel it is too evil to panic in informational code.

10101 ↗(On Diff #61166)

What is the return value ?

10206 ↗(On Diff #61166)

What if you do not lock the kernel_pmap ? You might need to be somewhat more careful with page table walk, but otherwise I do not see much harm.

sys/amd64/amd64/pmap.c
10050 ↗(On Diff #61166)

Sure.

10101 ↗(On Diff #61166)

False if the PTE is invalid or a leaf node, true otherwise. I will add a comment.

10206 ↗(On Diff #61166)

Hmm, will this work with the large map? pmap_large_unmap() appears to free PTPs. Otherwise I think this is doable.

sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

So sometimes you would dump not quite reasonable data for the large map. I think this will occur very rarely.

After thinking about it some more, you would only need to add one more check to the code, namely, verify that the physical address is below max memory. Or even better, verify that the physical address from any paging structure belongs to some segment.

sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

This doesn't quite work, since some kernel page table pages are not included in the phys segs, and it is difficult to determine whether such a verification is needed or not for a given PTE.

Simplify somewhat:

  • Avoid unnecessarily reloading PTEs on each iteration of each loop.
  • Separate the code which computes attributes of a mapping from that which traverses the tree. In particular handle PG_V and PG_PS, which determine whether or not to descend to the next level, in the main function.
sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

Then perhaps directly dig into smap or efi map. I think it is completely fine to access the paging structures unlocked, except we do not want randomly read device registers.

sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

We also do not want to read blacklisted pages.

I agree that unlocked accesses are fine, except in the large map. One simpler possibility is to hold the pmap lock only when the PML4 index is in the large map range. In general I would expect the large map to consist mostly of 1GB pages, and the pmap lock is not required when performing writeback.

Add a couple of missing checks to sysctl_kmaps_check().

Restart lookups for the current address if VA is within the large map
and the corresponding PA is not present in any vm_phys segment.

This revision is now accepted and ready to land.Sep 1 2019, 8:43 PM