Page MenuHomeFreeBSD

Add a sysctl to dump kernel mappings and their attributes.
ClosedPublic

Authored by markj on Fri, Aug 23, 3:57 PM.

Details

Summary

This is useful when auditing for writeable, executable mappings. I
included several other attributes of possible interest: cache mode, U/S
mode (obviously we should never see user mappings in the kernel pmap),
global bit, and the number of 4KB, 2MB and 1GB pages in a contiguous
range.

The sysctl is readable by default. This would be unacceptable if we had
KASLR, but we don't. However, it also lets a random user lock the
kernel pmap for a non-trivial duration (20-30ms on one of my systems),
so I am considering making it root-only.

I did not include the page array as a separate "map": I think the use of
a dedicated PML4E for the page array is problematic and plan to move it
to the kernel map based on some discussion with Jeff. Among other
things, we currently don't include the page array in minidumps.

Once changes from the review settle, I will implement the sysctl for
other pmaps that I can test (i386, arm64, riscv).

Test Plan

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

markj created this revision.Fri, Aug 23, 3:57 PM
markj edited the test plan for this revision. (Show Details)Fri, Aug 23, 4:00 PM
markj added reviewers: alc, kib, dougm.
kib added inline comments.Sat, Aug 24, 9:00 PM
sys/amd64/amd64/pmap.c
10050 ↗(On Diff #61166)

Can we print the raw pat bits in hex instead of panicing ? I feel it is too evil to panic in informational code.

10101 ↗(On Diff #61166)

What is the return value ?

10206 ↗(On Diff #61166)

What if you do not lock the kernel_pmap ? You might need to be somewhat more careful with page table walk, but otherwise I do not see much harm.

markj added inline comments.Sun, Aug 25, 1:53 AM
sys/amd64/amd64/pmap.c
10050 ↗(On Diff #61166)

Sure.

10101 ↗(On Diff #61166)

False if the PTE is invalid or a leaf node, true otherwise. I will add a comment.

10206 ↗(On Diff #61166)

Hmm, will this work with the large map? pmap_large_unmap() appears to free PTPs. Otherwise I think this is doable.

kib added inline comments.Sun, Aug 25, 7:26 AM
sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

So sometimes you would dump not quite reasonable data for the large map. I think this will occur very rarely.

After thinking about it some more, you would only need to add one more check to the code, namely, verify that the physical address is below max memory. Or even better, verify that the physical address from any paging structure belongs to some segment.

markj added inline comments.Thu, Aug 29, 1:55 AM
sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

This doesn't quite work, since some kernel page table pages are not included in the phys segs, and it is difficult to determine whether such a verification is needed or not for a given PTE.

markj updated this revision to Diff 61439.Thu, Aug 29, 2:05 AM

Simplify somewhat:

  • Avoid unnecessarily reloading PTEs on each iteration of each loop.
  • Separate the code which computes attributes of a mapping from that which traverses the tree. In particular handle PG_V and PG_PS, which determine whether or not to descend to the next level, in the main function.
kib added inline comments.Thu, Aug 29, 11:35 AM
sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

Then perhaps directly dig into smap or efi map. I think it is completely fine to access the paging structures unlocked, except we do not want randomly read device registers.

markj added inline comments.Thu, Aug 29, 6:19 PM
sys/amd64/amd64/pmap.c
10206 ↗(On Diff #61166)

We also do not want to read blacklisted pages.

I agree that unlocked accesses are fine, except in the large map. One simpler possibility is to hold the pmap lock only when the PML4 index is in the large map range. In general I would expect the large map to consist mostly of 1GB pages, and the pmap lock is not required when performing writeback.

markj updated this revision to Diff 61465.Thu, Aug 29, 6:32 PM

Add a couple of missing checks to sysctl_kmaps_check().

markj updated this revision to Diff 61536.Sun, Sep 1, 8:05 PM

Restart lookups for the current address if VA is within the large map
and the corresponding PA is not present in any vm_phys segment.

kib accepted this revision.Sun, Sep 1, 8:43 PM
This revision is now accepted and ready to land.Sun, Sep 1, 8:43 PM