Page MenuHomeFreeBSD

amd64 pmap: do not sleep in _pmap_allocpte() with zero referenced page table page.
ClosedPublic

Authored by kib on Jan 4 2021, 7:38 PM.

Details

Summary

Otherwise parallel _pmap_allocpte() for nearby va might also fail allocating page table page and free the page under us. The end result is that we could dereference unmapped pte when doing cleanup after sleep.

Instead, on allocation failure, first free everything, then sleep right before returning to caller. Due to recursive nature of _pmap_allocpte(), additional argument indicates is this the first call level, or deeper recursive call.

Reported by: pho

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib requested review of this revision.Jan 4 2021, 7:38 PM
kib created this revision.
sys/amd64/amd64/pmap.c
4402

Why is va labeled __unused?

kib marked an inline comment as done.Jan 5 2021, 12:11 AM
kib added inline comments.
sys/amd64/amd64/pmap.c
4402

Right now it should be not. It came from some intermediate version of LA57 patch where va was only used to improve asserts.

kib marked an inline comment as done.

Remove __unused

I ran a full stress2 test on mercat1.

sys/amd64/amd64/pmap.c
4402

Why not split it into two functions, one for handling the base case of the recursion (try to alloc and sleep if that fails), and a second recursive function that never sleeps?

kib marked an inline comment as done.Jan 8 2021, 7:16 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
4402

Done, with the note that alloc cannot go into non-recursive part, it must occur in the recursive helper when needed.

kib marked an inline comment as done.Jan 8 2021, 7:17 PM

Instead of the first_level arg to _pmap_allocpte, split non-sleepable recursive part into pmap_allocpte_nosleep().

Updated diff contains unrelated fix for pmap_mincore() dereferencing NULL pdpep, reported by Peter while testing this patch. (It is separate commit, of course).

sys/amd64/amd64/pmap.c
4367

The comment is stale now.

4402

I would call it _pmap_allocpte_nosleep() or __pmap_allocpte(), to make it clearer that it is an internal helper for _pmap_allocpte().

kib marked 2 inline comments as done.Jan 9 2021, 8:30 PM
kib added inline comments.
sys/amd64/amd64/pmap.c
4367

I also moved first two paragraphs to the pmap_allocpte_alloc() herald. It is more logical.

4402

I renamed it to pmap_allocpte_alloc(). I quite dislike the name that pretends on belonging to the language implementation namespace.

kib marked 2 inline comments as done.

_pmap_allocpte()->pmap_allocpte_alloc()
Rearrange and split comment.

Do riscv or arm64 have the same bug?

sys/amd64/amd64/pmap.c
4542

s/from/to/

This revision is now accepted and ready to land.Jan 11 2021, 3:26 PM
kib marked an inline comment as done.Jan 11 2021, 4:52 PM

Do riscv or arm64 have the same bug?

No, this bug was introduced with LA57 rewrite of pmap_allocpte(). riscv and arm64 forked pmap.c before LA57.

I ran a two-hour test on two different hosts with D27956.81970.diff added.
No problems seen.

In D27956#627791, @kib wrote:

Do riscv or arm64 have the same bug?

No, this bug was introduced with LA57 rewrite of pmap_allocpte(). riscv and arm64 forked pmap.c before LA57.

Previously, our "strategy" was to allocate all required page table pages before linking any of them into the page table. My impression from a brief examination is that with the LA57 changes that is no longer true, which led to this problem. Specifically, under LA57, a pml4 page might be allocated and linked into the page table before we had successfully allocated all of the required lower-level page table pages, yes? More generally, what was the reason for the change in strategy?

In D27956#629875, @alc wrote:
In D27956#627791, @kib wrote:

Do riscv or arm64 have the same bug?

No, this bug was introduced with LA57 rewrite of pmap_allocpte(). riscv and arm64 forked pmap.c before LA57.

Previously, our "strategy" was to allocate all required page table pages before linking any of them into the page table. My impression from a brief examination is that with the LA57 changes that is no longer true, which led to this problem. Specifically, under LA57, a pml4 page might be allocated and linked into the page table before we had successfully allocated all of the required lower-level page table pages, yes? More generally, what was the reason for the change in strategy?

The biggest issue was that rollback becomes too unwieldy with 5-level (in fact when simultaneously 4 and 5 levels should be handled). Making allocpte() reasonably structured was the most time-consuming part of the whole work. In the current allocpte, rollback is 'local' so to say (local to each page tree level).

Then I realized that we do not reuse higher pages if lower vm_page_alloc() fails. This makes the sleep rude, since we do not free pages that probably needed by somebody else, while sleeping. After that realization, a new structure for allocpte() was naturally written, which I found satisfactory.

The missed detail was the location of the sleep, but with it moved out of the recursive part it is even more clean IMO.