Page MenuHomeFreeBSD

Fix an off-by-one error in the VM page array on some systems.
ClosedPublic

Authored by jhb on May 30 2017, 11:21 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 19, 5:02 AM
Unknown Object (File)
Fri, Oct 10, 3:21 AM
Unknown Object (File)
Thu, Oct 9, 1:21 PM
Unknown Object (File)
Thu, Oct 9, 2:03 AM
Unknown Object (File)
Wed, Oct 1, 5:18 PM
Unknown Object (File)
Sun, Sep 28, 3:52 PM
Unknown Object (File)
Sep 21 2025, 4:11 PM
Unknown Object (File)
Sep 13 2025, 8:00 AM
Subscribers

Details

Summary

r31386 changed how the size of the VM page array was calculated to be
less wasteful. For most systems, the amount of memory is divided by the
overhead required by each page (a page of data plus a struct vm_page) to
determine the maximum number of available pages. However, if the
remainder for the first non-available page was at least a page of data
(so that the only memory missing was a struct vm_page), this last page
was left in phys_avail[] but was not allocated an entry in the VM page
array. Handle this case by explicitly excluding the page from phys_avail[].

Specifically, with a MALTA64 kernel under qemu I had the following.

Initial phys_avail:

(gdb) p/x phys_avail
$2 = {0x833000, 0x10000000, 0x90000000, 0x100000000, 0x0, 0x0, 0x0, 0x0, 0x0,

0x0, 0x0, 0x0}

Total pages in system:

(gdb) p size / 4096
$19 = 522093

Calculated page_range (existing code):

(gdb) p size / (4096 + sizeof(struct vm_page))
$25 = 509164

Number of pages needed to hold VM page array:

(gdb) p (509164 * sizeof(struct vm_page) + 4095) / 4096
$102 = 12928

Which when subtracted from 'end' to compute 'new_end' left the number of
pages described in new phys_avail:

(gdb) p 522093 - 12928
$103 = 509165

Which is one larger than the size of the VM page array.

I have thought through if the DENSE case has the same issue in its
calculation.

One odd thing here is that you have to round up the size of the VM page
array (rather than round down / truncate) to ensure enough room is taken
from 'end'.

The MALTA kernel with mips32 under qemu exhibited the same panic but I
haven't yet tested the fix there.

Test Plan
  • boot MALTA64 kernel under qemu and observe panic in vm_freelist_del() on boot due to vm_phys_free_pages() finding an uninitialized 'buddy' page (all zeroes) which matches the order of the N-1 page being added, so vm_phys_free_pages() tries to pair them up.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 9708
Build 10148: arc lint + arc unit

Event Timeline

Should say "I _haven't_ thought through..."

Also, verified that the fix also lets MALTA mips 32 boot in qemu now.

I just looked at one of my amd64 machines, and there the current calculation is correct. And, in fact, the use of howmany() would be off by one. Specifically, it would allocate an unused struct vm_page.

Here is the difference between my case and yours. If we don't perform integer division, then my page range value is 8308361.508571 and your page range value is 509164.982857. If I multiply just the fractional part of my page range value by PAGE_SIZE + sizeof(struct vm_page), I get 2135.998200 bytes. However, in your case that multiplication yields 4127.9994 bytes, which exceeds the page size. This is where your extra page comes from. In other words, there was enough residual memory for an entire page, but not the struct vm_page to go with it. In short, when the sizeof(struct vm_page) = 104 and the fraction that is thrown away by integer division exceeds .975, the current code is off by one.

I think we have to err on the side of the extra vm_page_t though? Either that or we have to calculate new_end in such a way that we only use 'page_range' pages? Hmm, so I think what you are saying can be written this way instead perhaps:

page_range = (size + sizeof(struct vm_page) - 1) / (PAGE_SIZE + sizeof(struct vm_page))

This is similar to howmany() except it would only round up in the case that the "gap" is smaller than the size of 'struct vmpage' and thus >= PAGE_SIZE.

In D11000#227997, @jhb wrote:

I think we have to err on the side of the extra vm_page_t though? Either that or we have to calculate new_end in such a way that we only use 'page_range' pages?

Yes to the latter. Occasionally allocating an extra struct vm_page to consume the excess whole page as part of vm_page_array is perhaps simpler to implement but not necessarily simpler to explain. An alternative would be to compute size % (PAGE_SIZE + sizeof(struct vm_page)) and subtract PAGE_SIZE from new_end when the remainder is greater than or equal to PAGE_SIZE. Then, page_range would always exactly agree with the available pages.

Hmm, so I think what you are saying can be written this way instead perhaps:

page_range = (size + sizeof(struct vm_page) - 1) / (PAGE_SIZE + sizeof(struct vm_page))

This is similar to howmany() except it would only round up in the case that the "gap" is smaller than the size of 'struct vmpage' and thus >= PAGE_SIZE.

You need to remove the "- 1" term. It is incorrect. It just makes the off-by-one error less likely to occur.

The dense case should also be changed.

On amd64, arm, arm64, i386, and sparc64, the allocation of the reservation array was masking this problem.

I would suggest including this change in the final patch. I think it makes the DENSE case a little easier to read, and it eliminates pointless "spelling" differences in the page range calculations between the SPARSE and DENSE cases.

Index: vm/vm_page.c
===================================================================
--- vm/vm_page.c        (revision 319090)
+++ vm/vm_page.c        (working copy)
@@ -575,6 +575,8 @@ vm_page_startup(vm_offset_t vaddr)
                size += phys_avail[i + 1] - phys_avail[i];
        page_range = size / (PAGE_SIZE + sizeof(struct vm_page));
 #elif defined(VM_PHYSSEG_DENSE)
+       size = high_avail - low_avail;
+
        /*
         * In the VM_PHYSSEG_DENSE case, the number of pages can account for
         * the overhead of a page structure per page only if vm_page_array is
@@ -583,10 +585,9 @@ vm_page_startup(vm_offset_t vaddr)
         * underlying vm_page_array, even though they will not be used.
         */
        if (new_end == high_avail)
-               page_range = (high_avail - low_avail) / (PAGE_SIZE +
-                   sizeof(struct vm_page));
+               page_range = size / (PAGE_SIZE + sizeof(struct vm_page));
        else
-               page_range = high_avail / PAGE_SIZE - first_page;
+               page_range = size / PAGE_SIZE;
 #else
 #error "Either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE must be defined."
 #endif

Rework to explicitly adjust 'new_end' if it contains an extra page.

sys/vm/vm_page.c
591

This can be simplified to page_range = size / PAGE_SIZE;

sys/vm/vm_page.c
598–602

I would be explicit about there being an extra page without a corresponding struct vm_page.

sys/vm/vm_page.c
605

The "high_avail == end" test before the vm_reserv_startup() call is no longer going to work.

jhb marked 2 inline comments as done.Jun 7 2017, 4:02 PM
jhb added inline comments.
sys/vm/vm_page.c
605

Ugh. I wonder if it would be cleaner to have a 'bool' that reflects this case (that we are allocating things from the last region)? 'bool alloc_from_last' or some such and use that both for the vm_reserv case and the 'new_end != high_avail' check above?

jhb marked an inline comment as done.Jun 7 2017, 9:02 PM
jhb added inline comments.
sys/vm/vm_page.c
605

The bool ended up being messy actually since it isn't always used so the compiler warned when it was only set but not used. Instead I adjusted high_avail when adjusting 'new_end'.

Fix some more things Alan noted.

Tested on: mips, mips64, amd64

jhb retitled this revision from Properly account for the size of the VM page array on some systems. to Fix an off-by-one error in the VM page array on some systems..Jun 7 2017, 9:11 PM
jhb edited the summary of this revision. (Show Details)

At first glance, this version looks correct. I will take a careful look when I get home tonight.

This revision is now accepted and ready to land.Jun 8 2017, 6:30 AM
This revision was automatically updated to reflect the committed changes.