Page MenuHomeFreeBSD

arm64: Fixing user space boudary checking in copyinout.S
ClosedPublic

Authored by der_semihalf.com on Mar 23 2016, 2:55 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jan 8, 10:19 PM
Unknown Object (File)
Wed, Dec 25, 12:58 PM
Unknown Object (File)
Nov 30 2024, 9:08 PM
Unknown Object (File)
Nov 18 2024, 7:47 PM
Unknown Object (File)
Oct 30 2024, 1:44 PM
Unknown Object (File)
Oct 29 2024, 8:13 PM
Unknown Object (File)
Oct 21 2024, 4:26 AM
Unknown Object (File)
Oct 20 2024, 8:16 AM
Subscribers

Details

Summary

Big buffer size could cause integer overflow and as a result
attempt to copy beyond VM_USERMAX_ADDRESS.

Fixing copyinstr boundary checking where compared value has been
overwritten by accident when setting fault handler.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

der_semihalf.com retitled this revision from to arm64: Fixing user space boudary checking in copyinout.S.
der_semihalf.com updated this object.
der_semihalf.com edited the test plan for this revision. (Show Details)
der_semihalf.com added reviewers: wma, zbb, andrew, kib.
der_semihalf.com set the repository for this revision to rS FreeBSD src repository - subversion.

Yes, there is the issue. But I would prefer a different solution. I think that the proposed change works because user/kernel split is by half of the userspace, and max-min appear to be less than half. I.e. by checking that the length is <= max-min, you actually ensure that there is no overflow when calculating the end address for copy.

So why not check that there is no unsigned overflow when calculating the end address, instead ? Like

adds x3, x1, x2
b.cs copyio_fault_nopcb

for copyout, etc.

I agree that the solution that you propose is more elegant. And takes less instructions.

How would this be triggered?

In D5719#122133, @kib wrote:

Yes, there is the issue. But I would prefer a different solution. I think that the proposed change works because user/kernel split is by half of the userspace, and max-min appear to be less than half. I.e. by checking that the length is <= max-min, you actually ensure that there is no overflow when calculating the end address for copy.

The user address space is at most 2^48 bytes, there is then an unusable region that neither userspace or the kernel can use, followed by up to 2^48 bytes for the kernel. The two regions are at the extreme of the address space, i.e. the userspace starts at the bottom of VA space, and the kernel finishes at the top of VA space.

Because of this I don't see how this would be anything other than an optimisation. If we were asked to copy such a large buffer, assuming the first user address is valid, we sill eventually hit the middle region where there are no valid virtual addresses. We will then fault & return an error. Depending on the start address this may be after a large amount of data has been copied, but the point is we will hit this invalid address well before getting to a kernel address or wrapping.

How would this be triggered?

In D5719#122133, @kib wrote:

Yes, there is the issue. But I would prefer a different solution. I think that the proposed change works because user/kernel split is by half of the userspace, and max-min appear to be less than half. I.e. by checking that the length is <= max-min, you actually ensure that there is no overflow when calculating the end address for copy.

The user address space is at most 2^48 bytes, there is then an unusable region that neither userspace or the kernel can use, followed by up to 2^48 bytes for the kernel. The two regions are at the extreme of the address space, i.e. the userspace starts at the bottom of VA space, and the kernel finishes at the top of VA space.

Because of this I don't see how this would be anything other than an optimisation. If we were asked to copy such a large buffer, assuming the first user address is valid, we sill eventually hit the middle region where there are no valid virtual addresses. We will then fault & return an error. Depending on the start address this may be after a large amount of data has been copied, but the point is we will hit this invalid address well before getting to a kernel address or wrapping.

Suppose that malicious user pass a valid kernel address as the buffer address, and huge length that causes overflow and results in the sum of addr+length to fall into valid user address range. Then e.g. copyout would successfully copy some data to userspace before faulting.

Because of this I don't see how this would be anything other than an optimisation.

We do not want this kind of optimizations? I do not see why it is worth to engage CPU cycles in action, that is certain to fail. That is sloppy from our side.

In D5719#122140, @kib wrote:

Suppose that malicious user pass a valid kernel address as the buffer address, and huge length that causes overflow and results in the sum of addr+length to fall into valid user address range. Then e.g. copyout would successfully copy some data to userspace before faulting.

This is an argument for using the ldtr & sttr instructions, or to check the user address is valid then rely on the unmapped middle section.

Because of this I don't see how this would be anything other than an optimisation.

We do not want this kind of optimizations? I do not see why it is worth to engage CPU cycles in action, that is certain to fail. That is sloppy from our side.

Not when it's an optimisation for an uncommon case at the cost of the common case.

This is an argument for using the ldtr & sttr instructions, or to check the user address is valid then rely on the unmapped middle section.

Using ldtr is harder than it worth, you are aware of D3617, the instruction requires fault handler to understand that failures still.

I also do not like relying on the hole, it makes a dependency on the hardware detail or implementation that could change and silently give the issue back.

Other arches do the correct thing, they check that end is computed as sum without an overflow, and then check that end does not extend beyond start of the kernel portion. It is my bug, and I even not sure why I was so mentally challenged to do the right thing on arm but not on arm64 almost simultaneously.

Because of this I don't see how this would be anything other than an optimisation.

We do not want this kind of optimizations? I do not see why it is worth to engage CPU cycles in action, that is certain to fail. That is sloppy from our side.

Not when it's an optimisation for an uncommon case at the cost of the common case.

The above is good argument.

In D5719#122151, @kib wrote:

I also do not like relying on the hole, it makes a dependency on the hardware detail or implementation that could change and silently give the issue back.

But this is even better argument to do the checking

In D5719#122151, @kib wrote:

Other arches do the correct thing, they check that end is computed as sum without an overflow, and then check that end does not extend beyond start of the kernel portion

I will look into your solution for arm and do it similarly.

I will look into your solution for arm and do it similarly.

I already explained it, the adds/b.cs check during the calculation of the buffer end.

der_semihalf.com edited edge metadata.

Implemented fixes suggested by kib.

kib edited edge metadata.

copyinstr fix is shame

This revision is now accepted and ready to land.Mar 24 2016, 9:49 AM
This revision was automatically updated to reflect the committed changes.