Page MenuHomeFreeBSD

Copyout(9) on 4/4 i386 needs correct vm_page_array[].
ClosedPublic

Authored by kib on Jul 1 2018, 3:13 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 25, 7:18 PM
Unknown Object (File)
Fri, Jan 24, 5:30 PM
Unknown Object (File)
Thu, Jan 23, 7:01 PM
Unknown Object (File)
Thu, Jan 23, 6:45 PM
Unknown Object (File)
Tue, Jan 21, 10:31 AM
Unknown Object (File)
Sat, Jan 18, 10:13 PM
Unknown Object (File)
Sat, Jan 18, 5:49 PM
Unknown Object (File)
Sat, Jan 18, 5:43 PM
Subscribers

Details

Summary

On the 4/4 i386, copyout(9) may need to call pmap_extract_and_hold() on arbitrary userspace mapping. If the mapping is backed by the non-managed cdev pager or by the sg pager, on dense configs we might access arbitrary element of vm_page_array[], in particular, not corresponding to a page from the memory segment. Initialize such pages as fictitious with the corresponding physical address.

Also, if the address falls out of the vm_page_array[] boundaries, PHYS_TO_VM_PAGE() returns NULL, which should be handled by pmap_extract_and_hold().

N.B. More elegant and probably reasonable patch would only touch uninitialized gaps in the vm_page_array[], then i386 ifdef can be removed, but I do not see an easy way to enumerate gaps.

Test Plan

This was demonstrated by bde, using acpidump -dt.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

This revision is now accepted and ready to land.Jul 2 2018, 9:36 AM
sys/i386/i386/pmap.c
1687–1688 ↗(On Diff #44722)

Is this a universal problem, i.e., a problem on all architectures? For example, couldn't unmanaged pages be the source data for the fast path in the pipe subsystem?

sys/i386/i386/pmap.c
1687–1688 ↗(On Diff #44722)

At least for amd64 yes. For other pmaps its needs an inspection.

Add amd64 pmap_extract_and_hold() fix.

I propose to go with this patch for now, and I will look at other pmaps when I am more operational.

This revision now requires review to proceed.Jul 3 2018, 3:10 PM
This revision is now accepted and ready to land.Jul 4 2018, 8:31 PM
In D16085#341502, @kib wrote:

Add amd64 pmap_extract_and_hold() fix.

I propose to go with this patch for now, and I will look at other pmaps when I am more operational.

No objections.

sys/amd64/amd64/pmap.c
2321 ↗(On Diff #44806)

I speculate that passing "pa" to PHYS_TO_VM_PAGE() here (and in the similar snippet on i386) will result in smaller, faster code because the compiler won't need to keep "pte" live across a function call.

sys/i386/i386/pmap.c
1687–1688 ↗(On Diff #44722)

Can't the same problem arise at the PDE level because of pmap_object_init_pt() creating a fictitious, unmanaged mapping? I would suggest adding the conditional there too.

This applies to amd64 too.

sys/vm/vm_page.c
798 ↗(On Diff #44806)

Isn't this "off-by-one" on the typical i386 machine? I thought that the first entry in the array typically had the physical address 4096.

This revision was automatically updated to reflect the committed changes.

Alan, I received your comments right after I committed the change. Sorry.

Use pa for PHYS_TO_VM_PAGE().
Check m != NULL for superpages as well.
Add first_page to correct phys_addr for the fictitious pages in vm_page_array (strange that it worked for bde).
Only initialize for DENSE configs on i386, PAE is SPARCE.

In D16085#342041, @kib wrote:

Alan, I received your comments right after I committed the change. Sorry.

No apology necessary.

Use pa for PHYS_TO_VM_PAGE().
Check m != NULL for superpages as well.
Add first_page to correct phys_addr for the fictitious pages in vm_page_array (strange that it worked for bde).

Isn't that because we always initialize the physical address of a fictitious page?

Only initialize for DENSE configs on i386, PAE is SPARCE.

sys/amd64/amd64/pmap.c
2310–2311 ↗(On Diff #44875)

You can use "pa" here too. (I checked the assembly, and the obvious difference is that the new stack frame is 16 bytes smaller.)

sys/i386/i386/pmap.c
1675–1676 ↗(On Diff #44875)

Ditto.

sys/vm/vm_page.c
554 ↗(On Diff #44875)

&& defined(VM_PHYSSEG_DENSE)

In D16085#342120, @alc wrote:
In D16085#342041, @kib wrote:

Add first_page to correct phys_addr for the fictitious pages in vm_page_array (strange that it worked for bde).

Isn't that because we always initialize the physical address of a fictitious page?

We initialize the physical address in the dynamically allocated vm_page structure from the fakepg_zone, and that address gets used when we establish a virtual-to-physical mapping. But, I don't think that we ever touch the physical address field in the vm_page structure that we obtain from vm_page_array through pmap_extract_and_hold().

kib marked 3 inline comments as done.Jul 5 2018, 10:17 AM
In D16085#342129, @alc wrote:

We initialize the physical address in the dynamically allocated vm_page structure from the fakepg_zone, and that address gets used when we establish a virtual-to-physical mapping. But, I don't think that we ever touch the physical address field in the vm_page structure that we obtain from vm_page_array through pmap_extract_and_hold().

Please look at the i386/copyout.c:cp_slow0(). I do vm_fault_quick_hold_pages() there, and then I use the physical address of the returned pages to establish ephemeral kernel mapping for the copyout. The pages returned from _quick_hold are from vm_page_array[].

In fact, the same note is valid for most other users of _quick_hold, e.g. pipe buffer mapping or kern_physio.c.

Use pa for superpages as well. Guard ii declaration properly.

In D16085#342214, @kib wrote:
In D16085#342129, @alc wrote:

We initialize the physical address in the dynamically allocated vm_page structure from the fakepg_zone, and that address gets used when we establish a virtual-to-physical mapping. But, I don't think that we ever touch the physical address field in the vm_page structure that we obtain from vm_page_array through pmap_extract_and_hold().

Please look at the i386/copyout.c:cp_slow0(). I do vm_fault_quick_hold_pages() there, and then I use the physical address of the returned pages to establish ephemeral kernel mapping for the copyout. The pages returned from _quick_hold are from vm_page_array[].

In fact, the same note is valid for most other users of _quick_hold, e.g. pipe buffer mapping or kern_physio.c.

Then, I haven't a clue as to how the earlier version worked for bde@. :-)

P.S. Can't we use vm_page_unhold_pages() at the end of cp_slow()?

This revision is now accepted and ready to land.Jul 5 2018, 3:59 PM
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Jul 5 2018, 4:39 PM
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Jul 5 2018, 4:43 PM
This revision was automatically updated to reflect the committed changes.