Page MenuHomeFreeBSD

Add a sysctl to dump kernel mappings and their attributes.
ClosedPublic

Authored by markj on Aug 23 2019, 3:57 PM.

Details

Summary

This is useful when auditing for writeable, executable mappings. I
included several other attributes of possible interest: cache mode, U/S
mode (obviously we should never see user mappings in the kernel pmap),
global bit, and the number of 4KB, 2MB and 1GB pages in a contiguous
range.

The sysctl is readable by default. This would be unacceptable if we had
KASLR, but we don't. However, it also lets a random user lock the
kernel pmap for a non-trivial duration (20-30ms on one of my systems),
so I am considering making it root-only.

I did not include the page array as a separate "map": I think the use of
a dedicated PML4E for the page array is problematic and plan to move it
to the kernel map based on some discussion with Jeff. Among other
things, we currently don't include the page array in minidumps.

Once changes from the review settle, I will implement the sysctl for
other pmaps that I can test (i386, arm64, riscv).

Test Plan

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 26223
Build 24720: arc lint + arc unit

Event Timeline

markj added reviewers: alc, kib, dougm.
sys/amd64/amd64/pmap.c
10063

Can we print the raw pat bits in hex instead of panicing ? I feel it is too evil to panic in informational code.

10114

What is the return value ?

10219

What if you do not lock the kernel_pmap ? You might need to be somewhat more careful with page table walk, but otherwise I do not see much harm.

sys/amd64/amd64/pmap.c
10063

Sure.

10114

False if the PTE is invalid or a leaf node, true otherwise. I will add a comment.

10219

Hmm, will this work with the large map? pmap_large_unmap() appears to free PTPs. Otherwise I think this is doable.

sys/amd64/amd64/pmap.c
10219

So sometimes you would dump not quite reasonable data for the large map. I think this will occur very rarely.

After thinking about it some more, you would only need to add one more check to the code, namely, verify that the physical address is below max memory. Or even better, verify that the physical address from any paging structure belongs to some segment.

sys/amd64/amd64/pmap.c
10219

This doesn't quite work, since some kernel page table pages are not included in the phys segs, and it is difficult to determine whether such a verification is needed or not for a given PTE.

Simplify somewhat:

  • Avoid unnecessarily reloading PTEs on each iteration of each loop.
  • Separate the code which computes attributes of a mapping from that which traverses the tree. In particular handle PG_V and PG_PS, which determine whether or not to descend to the next level, in the main function.
sys/amd64/amd64/pmap.c
10219

Then perhaps directly dig into smap or efi map. I think it is completely fine to access the paging structures unlocked, except we do not want randomly read device registers.

sys/amd64/amd64/pmap.c
10219

We also do not want to read blacklisted pages.

I agree that unlocked accesses are fine, except in the large map. One simpler possibility is to hold the pmap lock only when the PML4 index is in the large map range. In general I would expect the large map to consist mostly of 1GB pages, and the pmap lock is not required when performing writeback.

Add a couple of missing checks to sysctl_kmaps_check().

Restart lookups for the current address if VA is within the large map
and the corresponding PA is not present in any vm_phys segment.

This revision is now accepted and ready to land.Sep 1 2019, 8:43 PM