Add pg_nx flag in pmap_qenter to make kernel pages non-executable.
Details
BEFORE CHANGE:
db> show threads
. . . 101380 (0xfffff8011b1c2000) (stack 0xfffffe01ae761000) sched_switch() at sched_switch+0x98e/frame 0xfffffe01ae76a440 . . .
db> show pte 0xfffffe01ae761000
VA 0xfffffe01ae761000 pml4e 0x0000000434c027 pdpe 0x00000005df9063 pde 0x00000110f03063 pte 0x000000012c742103
pte & pg_nx (0x8000000000000000) == 0
AFTER CHANGE:
db> show threads
. . . 100917 (0xfffff80098259000) (stack 0xfffffe01ae709000) sched_switch() at sched_switch+0x98e/frame 0xfffffe01ae712440 . . .
db> show pte 0xfffffe01ae709000
VA 0xfffffe01ae709000 pml4e 0x0000000434f027 pdpe 0x00000005df9063 pde 0x000001153e0063 pte 0x800000002cfe8103
pte & pg_nx (0x8000000000000000) != 0
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
This change is probably fine but it requires re-thinking, I suspect. pmap_qenter() is used for much more things than only mapping the thread stacks, so each use must be inspected and confirmed as safe WRT no-executable mapping.
I think bpf jit would need a change to work with this — a kernel equivalent to this mprotect() call for the userspace case.
#ifndef _KERNEL
if (stream.ibuf != NULL && mprotect(stream.ibuf, *size, PROT_READ | PROT_EXEC) != 0) { munmap(stream.ibuf, *size); stream.ibuf = NULL; }
#endif
That mapping is created with pmap_enter(VM_PROT_ALL), so I think it is fine even with this change.
We seem to be missing a pmap_qenter_nx implementation for amd64.
sys/arm64/arm64/pmap.c | ||
---|---|---|
1242 ↗ | (On Diff #39109) | Hm, I find it a little difficult to believe 32-bit armv6 has an NX bit, but arm64 does not. |
sys/i386/i386/pmap.c | ||
1692 ↗ | (On Diff #39109) | It's unclear to me why these flags were lifted out. It doesn't make sense to qenter without V (valid) or RW (readable), IMO. (I don't know what pgeflag does, but I'd just leave it in place.) |
sys/riscv/riscv/pmap.c | ||
1031 ↗ | (On Diff #39109) | Like x86, seems like this could be: entry = PTE_V | PTE_R | PTE_W | flags; (With ordinary qenter then passing PTE_X and qenter_nx passing 0.) |
It looks like the feedback was just, "audit the pmap_qenter() callers in the tree and verify that they don't require an executable mapping." In particular, the review title and description are misleading, since the change does more than just make kernel stacks non-executable. Are there in fact any callers that require an executable mapping? I didn't find any with a quick look.
sys/arm64/arm64/pmap.c | ||
---|---|---|
1242 ↗ | (On Diff #39109) | arm64 has two NX bits, one each for userspace and the kernel. |
- I think markj@'s point is true: are there any pmap_qenter() callers that need X or should we just change the API to assume RW mappings only?
- If we were to refactor pmap_qenter() I think the better approach would be to rename the existing pmap implementations to 'pmap_qenter_prot' that takes an additional 'vm_prot_t prot' argument and make 'pmap_qenter()' a wrapper that just does 'pmap_qenter(...., VM_PROT_ALL)'.
Like Mark, I did a "grep" for pmap_qenter(), and none of the use cases appear to require execute permission. I think that we can simply redefine the access permissions provided by pmap_qenter(), and be done with this.
Upon inspection of callers of pmap_qenter, I was unable to find any that required X. Most callers either made a temporary mapping for memcpy, or were allocating objects to be used by the vmem system itself. The only one raised concern was the ELF header load called as part of the exec path. From what I can tell only the elf header is mapped, none of the code within that gets executed. I will resubmit the original diff.