Page MenuHomeFreeBSD

Fix handling of page wirings exec_map_first_page().
ClosedPublic

Authored by markj on Sep 23 2019, 5:24 PM.

Details

Summary

If the first page is not fully valid, or the attempt to page in
subsequent pages fails, we currently free the pages. However, the pages
may have already been resident and wired by mlock(). This can happen if
execve() attempts to activate an image which has been mlock()ed and
truncated.

Test Plan

syzkaller found the bug and generated a reproducer. I verified that this
patch fixes the crash triggered by that reproducer.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

markj created this revision.Sep 23 2019, 5:24 PM
markj edited the summary of this revision. (Show Details)Sep 23 2019, 5:25 PM
markj edited the test plan for this revision. (Show Details)
markj added reviewers: alc, kib.
kib added a comment.Sep 23 2019, 6:05 PM

What is the point of deactivating them ? Inactive scan frees invalid pages upon encountering them.

markj added a comment.Sep 23 2019, 6:07 PM
In D21767#474999, @kib wrote:

What is the point of deactivating them ? Inactive scan frees invalid pages upon encountering them.

As opposed to doing nothing? vm_page_deactivate_noreuse() ensures that the pages skip LRU, so the page daemon (which handles races with concurrent wirings) will free them sooner.

kib added a comment.Sep 23 2019, 6:10 PM
In D21767#474999, @kib wrote:

What is the point of deactivating them ? Inactive scan frees invalid pages upon encountering them.

As opposed to doing nothing? vm_page_deactivate_noreuse() ensures that the pages skip LRU, so the page daemon (which handles races with concurrent wirings) will free them sooner.

Instead of freeing them. If unwire returned true.

markj added a comment.Sep 23 2019, 6:17 PM
In D21767#475003, @kib wrote:
In D21767#474999, @kib wrote:

What is the point of deactivating them ? Inactive scan frees invalid pages upon encountering them.

As opposed to doing nothing? vm_page_deactivate_noreuse() ensures that the pages skip LRU, so the page daemon (which handles races with concurrent wirings) will free them sooner.

Instead of freeing them. If unwire returned true.

Indeed, in the first block we could simply free the page if vm_page_unwire_noq() returns true.

I thought the second block was trickier, but now I see it is sufficient to simply check vm_page_wired().

markj updated this revision to Diff 62479.Sep 23 2019, 6:18 PM

Instead of deactivating pages, check for wirings before freeing them.

kib accepted this revision.Sep 23 2019, 6:22 PM
This revision is now accepted and ready to land.Sep 23 2019, 6:22 PM
markj updated this revision to Diff 62481.Sep 23 2019, 6:26 PM

Unbusy pages if they are not freed.

This revision now requires review to proceed.Sep 23 2019, 6:26 PM
kib accepted this revision.Sep 23 2019, 6:38 PM
This revision is now accepted and ready to land.Sep 23 2019, 6:38 PM
markj added inline comments.Sep 23 2019, 6:56 PM
sys/kern/kern_exec.c
1014 ↗(On Diff #62481)

Obviously this should be ma[0], I did not commit the typo fix before uploading.

alc added a comment.Sep 24 2019, 6:23 PM

Do we actually allow truncation on a running executable file?

kib added a comment.Sep 24 2019, 6:35 PM
In D21767#475356, @alc wrote:

Do we actually allow truncation on a running executable file?

No for truncation, yes for invalidation.

markj added a comment.Sep 24 2019, 7:10 PM
In D21767#475357, @kib wrote:
In D21767#475356, @alc wrote:

Do we actually allow truncation on a running executable file?

No for truncation, yes for invalidation.

My review description is confusing. A concurrent truncation is not needed to trigger the problem. The key is that an mlock() of a read-only mapping of a file forces the backing pages to stay resident after a truncation. A subsequent execve() of the file will trigger the assertion, since the pages will be invalid and wired.

markj added a comment.Sep 24 2019, 9:17 PM

Here is a program that reproduces the panic: https://reviews.freebsd.org/P319

The program that syzkaller generated uses fewer system calls but is multi-threaded.

kib added a comment.Sep 25 2019, 12:33 PM

Here is a program that reproduces the panic: https://reviews.freebsd.org/P319
The program that syzkaller generated uses fewer system calls but is multi-threaded.

Yes, and I think that a similar effect can be achieved with msync(MS_INVALIDATE). BTW, we can disable MS_INVALIDATE on text mappings now.

That said, what is the use of msync(MS_INVALIDATE) except for writing test cases ?

markj added a comment.Sep 25 2019, 3:38 PM
In D21767#475487, @kib wrote:

Here is a program that reproduces the panic: https://reviews.freebsd.org/P319
The program that syzkaller generated uses fewer system calls but is multi-threaded.

Yes, and I think that a similar effect can be achieved with msync(MS_INVALIDATE). BTW, we can disable MS_INVALIDATE on text mappings now.

Probably, though note that msync(MS_INVALIDATE) fails if part of the region is user-wired.

Are you suggesting to check vn_writechk() in vm_object_sync()?

That said, what is the use of msync(MS_INVALIDATE) except for writing test cases ?

I'm not sure. I wonder if it made more sense before the VM and buffer cache unification. Maybe it is still useful for some types of device mappings where userspace must manage coherence.

markj retitled this revision from Fix a race in exec_map_first_page(). to Fix handling of page wirings exec_map_first_page()..Sep 25 2019, 6:00 PM
markj edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.