Page MenuHomeFreeBSD

Add support for freeing preloaded data.
ClosedPublic

Authored by markj on Jul 18 2018, 6:24 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 18, 5:25 PM
Unknown Object (File)
Sat, Jan 11, 3:46 PM
Unknown Object (File)
Dec 20 2024, 9:56 PM
Unknown Object (File)
Dec 19 2024, 5:31 PM
Unknown Object (File)
Dec 13 2024, 6:33 PM
Unknown Object (File)
Dec 5 2024, 5:38 PM
Unknown Object (File)
Nov 25 2024, 12:26 AM
Unknown Object (File)
Nov 24 2024, 7:53 AM
Subscribers

Details

Summary

At present we have no mechanism to free pages that were used for
preloaded data. For example, if one unloads a kernel module that was
loaded by the loader, its memory is cannot be reused. For early
microcode updates, I would like to load all updates as a single file,
select the correct one, and free the rest of the memory. This
simplifies the distribution of microcode updates.

This change adds kmem_bootstrap_free(), which takes an address range in
the kernel map and frees the KVA and physical pages to the VM. For this
to work, platforms which support this mechanism must create a vm_phys
segment for the physical memory used to store the kernel and preloaded
data. Note that the initial population of the physical memory allocator
is done using phys_avail[] rather than vm_phy_segs[].

The change also fixes a minor bug; the kernel linker was calling
preload_delete_name() with the basename of the specified kernel module
rather than the full path, so no matches occurred and the calls were
nops.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I am trying to convince myself that creating such segment and initializing corresponding pages in vm_page_array[] does not have other consequences.
The pages from the kernel text' segment are not added to the free lists because phys_avail[] does not include the kernel text. The pages are not processed by vm_page_scan_contig() because its order is VM_NFREEORDER.

Is there anything else which should be avoided ?

In D16330#346622, @kib wrote:

I am trying to convince myself that creating such segment and initializing corresponding pages in vm_page_array[] does not have other consequences.
The pages from the kernel text' segment are not added to the free lists because phys_avail[] does not include the kernel text. The pages are not processed by vm_page_scan_contig() because its order is VM_NFREEORDER.

Is there anything else which should be avoided ?

When auditing, the only potential problem I could see is with page blacklisting, which works by removing pages from the vm_phys allocator in vm_page_startup() after the initial population. However, we currently cannot prevent a blacklisted page from being used for the kernel image in the first place.

In D16330#346622, @kib wrote:

I am trying to convince myself that creating such segment and initializing corresponding pages in vm_page_array[] does not have other consequences.
The pages from the kernel text' segment are not added to the free lists because phys_avail[] does not include the kernel text. The pages are not processed by vm_page_scan_contig() because its order is VM_NFREEORDER.

Is there anything else which should be avoided ?

When auditing, the only potential problem I could see is with page blacklisting, which works by removing pages from the vm_phys allocator in vm_page_startup() after the initial population. However, we currently cannot prevent a blacklisted page from being used for the kernel image in the first place.

There is nothing we can done for blacklisting a page from the kernel image, at least until we adopt to loading kernel at arbitrary physical address. This is required to accept arbitrary EFI memmap, which is arguably more important than blacklisting.

Do you agree with my claims about free list and scan_contig() ?

In D16330#346881, @kib wrote:
In D16330#346622, @kib wrote:

I am trying to convince myself that creating such segment and initializing corresponding pages in vm_page_array[] does not have other consequences.
The pages from the kernel text' segment are not added to the free lists because phys_avail[] does not include the kernel text. The pages are not processed by vm_page_scan_contig() because its order is VM_NFREEORDER.

Is there anything else which should be avoided ?

When auditing, the only potential problem I could see is with page blacklisting, which works by removing pages from the vm_phys allocator in vm_page_startup() after the initial population. However, we currently cannot prevent a blacklisted page from being used for the kernel image in the first place.

There is nothing we can done for blacklisting a page from the kernel image, at least until we adopt to loading kernel at arbitrary physical address. This is required to accept arbitrary EFI memmap, which is arguably more important than blacklisting.

Do you agree with my claims about free list and scan_contig() ?

Yes, I believe they're correct.

LGTM.

sys/amd64/amd64/machdep.c
181 ↗(On Diff #45490)

Unrelated change?

This revision is now accepted and ready to land.Jul 19 2018, 2:59 PM

You should be aware of the code in sys/x86/iommy/busdma_dmar.c:dmar_bus_dmamap_load_buffer(), see the comment in if (dumping) block. I do not think that this change is enough to correct the block.

In D16330#346980, @kib wrote:

You should be aware of the code in sys/x86/iommy/busdma_dmar.c:dmar_bus_dmamap_load_buffer(), see the comment in if (dumping) block. I do not think that this change is enough to correct the block.

Indeed, it's not sufficient. For example, amd64's pmap.c:create_pagetables() seems to allocate physical memory without adding it to a vm_phys segment.

sys/amd64/amd64/machdep.c
181 ↗(On Diff #45490)

Left over from an earlier version of this change. I'll remove it before committing.

This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Jul 19 2018, 8:00 PM
This revision was automatically updated to reflect the committed changes.
head/sys/vm/vm_kern.c
710

Hmm. Up until two weeks ago, the below code would have crashed because vm_map_remove() unconditionally called pmap_remove(). Then, I remembered that change (and the fact that we don't setup the map entry to reference the kernel object.) In short, this strikes me as being fragile.

head/sys/vm/vm_kern.c
710

Oops, right, I was assuming that vm_map_remove() would update the kernel page tables (in which case the call should come after the loop). Instead of using vm_map_remove(), how about we instead call pmap_remove() and add the newly freed KVA to kernel_arena directly?

head/sys/vm/vm_kern.c
710

Hm, this doesn't really work - one can't demote the large mappings created for preloaded data.

head/sys/vm/vm_kern.c
710

I believe that the problem is that pmap_remove() is looking for a page table page (PTP) in the kernel pmap's radix tree of spare PTPs and not finding it. create_pagetables() is actually allocating the required PTPs, and pmap_init() is initializing them. However, pmap_init() needs additional code that figures out which ones are unused (because create_pagetables() setup a 2 MB page mapping over top of them) and that inserts those unused PTPs into the radix tree.

head/sys/vm/vm_kern.c
710

I attempted to address this in D16426.