amd64: fix PKRU and swapout interaction
ClosedPublic
Actions

Authored by kib on Apr 13 2023, 11:46 AM.

Details

Reviewers

andrew
markj
alc
manu

Commits

rG1e0e335b0f0d: amd64: fix PKRU and swapout interaction

Summary

When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_remove_swapout(), to
allow pmap to distinguish between swapout and real removes.

For non-amd64, pmap_remove_swapout() is stubbed by define to pmap_remove().

Reported by:    andrew

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

kib created this revision.Apr 13 2023, 11:46 AM

Herald added a reviewer: manu. · View Herald TranscriptApr 13 2023, 11:46 AM

Herald added subscribers: jhibbits, emaste, imp. · View Herald Transcript

kib requested review of this revision.Apr 13 2023, 11:46 AM

Does this suggest that PKRU metadata should be maintained at the vm_map layer? Right now we have the strange semantics that covering a mapping with mprotect(PROT_NONE) or msync(MS_INVALIDATE) will destroy PKRU metadata. Rather than forcing each caller of pmap_remove() to decide whether it is PKRU-preserving, it seems cleaner to tie (non-"persist") keys to the lifetime of a vm_map entry.

I am not sure that PKRU should be at vm_map. Current semantic is that sysarch(SET_PKRU) just sets the pkru key for the UVM range. Whatever mappings appear in the range, get the specified key. It is more close to segments and not to map entries.

Attaching PKRU to vm_map_entry would potentially increase the map fragmentation, without benefits to the API. I am indeed wondering if PKRU key removal on unmap is wise, but at least it seems convenient. So if anything, I would agree to keys removal only by explicit sysarch(CLEAR_PKRU) (if we can change the API at this point).

In D39556#900386, @kib wrote:

I am not sure that PKRU should be at vm_map. Current semantic is that sysarch(SET_PKRU) just sets the pkru key for the UVM range. Whatever mappings appear in the range, get the specified key. It is more close to segments and not to map entries.

I think you could maintain the existing scheme, i.e., maintain a separate range tree for PKRU ranges and do not make them a "key" of vm_map_entries. In fact, this is required in order to implement AMD64_PKRU_PERSIST. The vm_map_layer() is just a better place to manage PKRU range lifetimes. And it would simplify implementation of a similar feature for other platforms, e.g., I think POWER has a similar mechanism.

Attaching PKRU to vm_map_entry would potentially increase the map fragmentation, without benefits to the API. I am indeed wondering if PKRU key removal on unmap is wise, but at least it seems convenient. So if anything, I would agree to keys removal only by explicit sysarch(CLEAR_PKRU) (if we can change the API at this point).

Hmm, I'm not sure why it is unwise? I would expect semantics similar to mprotect(). Once a protected mapping is destroyed, a new mapping at that location should not inherit protections.

On arm64 we also need a way to manage extra information on a range of memory. In that case it will be when to set a flag in the last level page table entry and the life time should be enabled until unmap. I added a flag to vm_map_entry in D39449, but am happy to change if there's a better way to handle it.

In D39556#900426, @markj wrote:

In D39556#900386, @kib wrote:

I am not sure that PKRU should be at vm_map. Current semantic is that sysarch(SET_PKRU) just sets the pkru key for the UVM range. Whatever mappings appear in the range, get the specified key. It is more close to segments and not to map entries.

I think you could maintain the existing scheme, i.e., maintain a separate range tree for PKRU ranges and do not make them a "key" of vm_map_entries. In fact, this is required in order to implement AMD64_PKRU_PERSIST. The vm_map_layer() is just a better place to manage PKRU range lifetimes. And it would simplify implementation of a similar feature for other platforms, e.g., I think POWER has a similar mechanism.

I tried to look at the PKRU refactoring from this angle, and I do not see what would be a change there. To maintain PKRU-like (BTI-like, and perhaps Intel CFI, but there are more complications AFAIR) pmap needs hooks to execute at specific vm_map events. But I cannot make it into a global per-map rangeset which is maintained by vm_map layer, because a) rangeset is controlled by user with sysarch(2) b) vm_map events do not directly mirrors to the rangeset events, they require some interpretation by MD code c) there might be more than one rangeset (e.g. PKRU and CFI).

The end result looks identical to the present amd64 pmap organization, where PKRU ops are performed at places where vm_map informs pmap about various events, like vmspace copy (fork), total destruction (exec/exit), or specific unmap. The patch in the review only fixes a bug where specific unmap could be reported for situation which is not application unmap, so we need to make pmap able to distinguish.

If you have something else in mind, could you explain it more detailed?

Attaching PKRU to vm_map_entry would potentially increase the map fragmentation, without benefits to the API. I am indeed wondering if PKRU key removal on unmap is wise, but at least it seems convenient. So if anything, I would agree to keys removal only by explicit sysarch(CLEAR_PKRU) (if we can change the API at this point).

Hmm, I'm not sure why it is unwise? I would expect semantics similar to mprotect(). Once a protected mapping is destroyed, a new mapping at that location should not inherit protections.

One of the models where PKRU is intended to be used are e.g. private per-thread heaps. You assign the thread key to some arena, and then mmap/munmap in that arena are only visible to the specific key owner.

In D39556#900591, @kib wrote:

In D39556#900426, @markj wrote:

In D39556#900386, @kib wrote:

I am not sure that PKRU should be at vm_map. Current semantic is that sysarch(SET_PKRU) just sets the pkru key for the UVM range. Whatever mappings appear in the range, get the specified key. It is more close to segments and not to map entries.

I think you could maintain the existing scheme, i.e., maintain a separate range tree for PKRU ranges and do not make them a "key" of vm_map_entries. In fact, this is required in order to implement AMD64_PKRU_PERSIST. The vm_map_layer() is just a better place to manage PKRU range lifetimes. And it would simplify implementation of a similar feature for other platforms, e.g., I think POWER has a similar mechanism.

I tried to look at the PKRU refactoring from this angle, and I do not see what would be a change there. To maintain PKRU-like (BTI-like, and perhaps Intel CFI, but there are more complications AFAIR) pmap needs hooks to execute at specific vm_map events. But I cannot make it into a global per-map rangeset which is maintained by vm_map layer, because a) rangeset is controlled by user with sysarch(2)

Sorry, why does this matter?

b) vm_map events do not directly mirrors to the rangeset events, they require some interpretation by MD code

That's true of the pmap as well. In this patch, MI code needs to know (indirectly) about MD features.

c) there might be more than one rangeset (e.g. PKRU and CFI).

The end result looks identical to the present amd64 pmap organization, where PKRU ops are performed at places where vm_map informs pmap about various events, like vmspace copy (fork), total destruction (exec/exit), or specific unmap. The patch in the review only fixes a bug where specific unmap could be reported for situation which is not application unmap, so we need to make pmap able to distinguish.

I don't like the patch because it exposes a layering violation. Every pmap_remove() caller needs to consider whether it needs to preserve keys, but they do not need to do the same for, e.g., regular memory protections. For example, is it ok for pmap_protect(PROT_NONE) to remove keys?

If you have something else in mind, could you explain it more detailed?

My earlier notion of maintaining key metadata at the vm_map layer seems hard to implement.

I think it might be sufficient to invert your patch: introduce a new pmap_unmap() or pmap_delete() method which is supposed to be called any time a logical mapping is removed, e.g., in vm_map_delete(). On amd64, it would clear keys for the range and call pmap_remove(). pmap_remove() would no longer modify protection keys.

In particular, shouldn't most existing callers of pmap_remove() preserve keys? The only exception I can see is in vm_map_delete().

Attaching PKRU to vm_map_entry would potentially increase the map fragmentation, without benefits to the API. I am indeed wondering if PKRU key removal on unmap is wise, but at least it seems convenient. So if anything, I would agree to keys removal only by explicit sysarch(CLEAR_PKRU) (if we can change the API at this point).

Hmm, I'm not sure why it is unwise? I would expect semantics similar to mprotect(). Once a protected mapping is destroyed, a new mapping at that location should not inherit protections.

One of the models where PKRU is intended to be used are e.g. private per-thread heaps. You assign the thread key to some arena, and then mmap/munmap in that arena are only visible to the specific key owner.

Flip to pmap_map_delete(). [IMO just pmap_delete() is too confusing in parallel with pmap_remove() etc]

markj accepted this revision.Apr 14 2023, 5:18 PM

markj added inline comments.

sys/amd64/amd64/pmap.c
6451	I think some comment here would be useful, along the lines of, "Remove the given range of addresses as part of a logical unmap operation. This has the effect of calling pmap_remove(), but also clears any metadata that should persist for the lifetime of a logical mapping."

This revision is now accepted and ready to land.Apr 14 2023, 5:18 PM

kib marked an inline comment as done.Apr 14 2023, 5:33 PM

Closed by commit rG1e0e335b0f0d: amd64: fix PKRU and swapout interaction (authored by kib). · Explain WhyApr 14 2023, 11:54 PM

This revision was automatically updated to reflect the committed changes.

kib added a commit: rG1e0e335b0f0d: amd64: fix PKRU and swapout interaction.