Page MenuHomeFreeBSD

Sparsify the vm_page_dump bitmap
ClosedPublic

Authored by scottph on Aug 20 2020, 6:53 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Dec 5, 5:57 PM
Unknown Object (File)
Wed, Dec 4, 11:43 PM
Unknown Object (File)
Wed, Dec 4, 11:42 PM
Unknown Object (File)
Wed, Dec 4, 11:42 PM
Unknown Object (File)
Wed, Dec 4, 11:42 PM
Unknown Object (File)
Wed, Dec 4, 11:18 PM
Unknown Object (File)
Nov 24 2024, 1:05 PM
Unknown Object (File)
Nov 24 2024, 10:44 AM

Details

Summary

On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing int for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

MFC after: 1 week
Sponsored by: Ampere Computing, Inc.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

In general it seems like there's an implicit assumption that nothing will ask for a PA that isn't in dump_avail[], and if it does we just return the wrong page since we just return some offset relative to the previous dump_avail entry that overflows into the next entry's bitmap? It would be annoying to fix as it means all the places that call _kvm_pa_to_bit(), and or adding some kind of _kvm_has_pa().

The '128' seems to be a magic number that is just the amount of space in the header page where the dump_avail array can be fit? If we don't abuse that page but stick it later like we do for the bitmap, etc. that would be more work, but would avoid the arbitrary number (that might also be not quite right for 32-bit?)

lib/libkvm/kvm_private.c
302 ↗(On Diff #76019)

So this means we can't examine old crash dumps? Presumably it would be possible to add a fallback for the previous version if the architecture-specific backends passed in a "0" as the offset which would leave kd->dump_avail as NULL and then the two helper routines could use the old 'pa / idx' logic. In particular, it would be kind of annoying on stable that you can't examine an old crash dump, or that kgdb on head can't examine a stable crash dump, etc. Keep in mind that kgdb can do cross-debug (so you can copy the vmcrash from an RPi running head off onto a stable/12 amd64 host to debug).

Also, given that we are bumping the version, we could also include the size of the dump_avail array in the header as well to avoid the need for the realloc loop. It would also let you just mmap() the array directly instead of using read().

scottph edited the summary of this revision. (Show Details)
In D26131#582102, @jhb wrote:

In general it seems like there's an implicit assumption that nothing will ask
for a PA that isn't in dump_avail[], and if it does we just return the wrong
page since we just return some offset relative to the previous dump_avail
entry that overflows into the next entry's bitmap? It would be annoying to
fix as it means all the places that call _kvm_pa_to_bit(), and or adding some
kind of _kvm_has_pa().

I've added an INVALID return value for _kvm_pa_to_bit.

The '128' seems to be a magic number that is just the amount of space in the
header page where the dump_avail array can be fit? If we don't abuse that
page but stick it later like we do for the bitmap, etc. that would be more
work, but would avoid the arbitrary number (that might also be not quite
right for 32-bit?)

Added a separate page in the output for dump_avail, and handle its
absence in libkvm by falling back to the old behavior.

lib/libkvm/kvm_private.c
302 ↗(On Diff #76019)

This should maintain compatibility with old dumps now, we could even possibly not bump the dump version number, but that seemed kinda wrong so I've updated here and checked for the old version by number, like amd64 was doing before.

Just one suggestion, but in general looks good. Thanks for making these changes.

lib/libkvm/kvm_private.c
309 ↗(On Diff #76446)

Nice way to handle this.

lib/libkvm/kvm_private.h
109 ↗(On Diff #76446)

Slight nit: perhaps have this be void * and require explicit casts for 64-bit as well as 32-bit?

This revision is now accepted and ready to land.Sep 17 2020, 11:25 PM
This revision was automatically updated to reflect the committed changes.

fwiw, I get a compiler error on gcc-6 on mips:

In file included from /usr/home/adrian/work/freebsd/head-embedded/src/sys/geom/geom_io.c:67:0:
/usr/home/adrian/work/freebsd/head-embedded/src/sys/vm/vm_page.h: In function 'vm_page_dump_index_to_pa':
/usr/home/adrian/work/freebsd/head-embedded/src/sys/vm/vm_page.h:641:40: error: suggest parentheses around '+' in operand of '&' [-Werror=parentheses]

return ((vm_paddr_t)bit * PAGE_SIZE +
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
    dump_avail[i] & ~PAGE_MASK);
    ~~~~~~~~~~~~~

cc1: all warnings being treated as errors

  • geom_io.o ---
  • [geom_io.o] Error code 1
head/sys/vm/vm_page.h
592

hi!

This is a duplicate definition; the original definition is in vm_phys.h and it's tripping up gcc-6 with redundant declarations warnings-as-errors enabled.

Do these functions need to be inlined nowdays?

fwiw, I get a compiler error on gcc-6 on mips:

In file included from /usr/home/adrian/work/freebsd/head-embedded/src/sys/geom/geom_io.c:67:0:
/usr/home/adrian/work/freebsd/head-embedded/src/sys/vm/vm_page.h: In function 'vm_page_dump_index_to_pa':
/usr/home/adrian/work/freebsd/head-embedded/src/sys/vm/vm_page.h:641:40: error: suggest parentheses around '+' in operand of '&' [-Werror=parentheses]

return ((vm_paddr_t)bit * PAGE_SIZE +
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
    dump_avail[i] & ~PAGE_MASK);
    ~~~~~~~~~~~~~

cc1: all warnings being treated as errors

  • geom_io.o ---
  • [geom_io.o] Error code 1

Does the following get rid of the warning ?

diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
index 915be661699..ff8bd0a9f21 100644
--- a/sys/vm/vm_page.h
+++ b/sys/vm/vm_page.h
@@ -639,7 +639,7 @@ vm_page_dump_index_to_pa(int bit)
 		    dump_avail[i] / PAGE_SIZE;
 		if (bit < tot)
 			return ((vm_paddr_t)bit * PAGE_SIZE +
-			    dump_avail[i] & ~PAGE_MASK);
+			    (dump_avail[i] & ~PAGE_MASK));
 		bit -= tot;
 	}
 	return ((vm_paddr_t)NULL);
head/sys/vm/vm_page.h
592

Show exact compiler message for this case.

kevans added inline comments.
head/sys/vm/vm_page.h
592
--- kern_dump.o ---                                                                                                                    In file included from /usr/src/sys/kern/kern_dump.c:41:                                                                                /usr/src/sys/vm/vm_phys.h:50:19: error: redundant redeclaration of 'dump_avail' [-Werror=redundant-decls]                                 50 | extern vm_paddr_t dump_avail[PHYS_AVAIL_COUNT];                                                                                      |                   ^~~~~~~~~~                                                                                                   In file included from /usr/src/sys/kern/kern_dump.c:40:                                                                                /usr/src/sys/vm/vm_page.h:592:19: note: previous declaration of 'dump_avail' was here                                                  
  592 | extern vm_paddr_t dump_avail[];                          
      |                   ^~~~~~~~~~
head/sys/vm/vm_page.h
592

Does the following help ?

diff --git a/sys/vm/vm_phys.h b/sys/vm/vm_phys.h
index 438f83d46e9..ec248c79b5d 100644
--- a/sys/vm/vm_phys.h
+++ b/sys/vm/vm_phys.h
@@ -47,7 +47,7 @@
 #endif
 
 extern vm_paddr_t phys_avail[PHYS_AVAIL_COUNT];
-extern vm_paddr_t dump_avail[PHYS_AVAIL_COUNT];
+extern vm_paddr_t dump_avail[];
 
 /* Domains must be dense (non-sparse) and zero-based. */
 struct mem_affinity {
head/sys/vm/vm_page.h
592

Negative:

In file included from /usr/src/sys/kern/kern_malloc.c:82:
/usr/src/sys/vm/vm_phys.h:50:19: error: redundant redeclaration of 'dump_avail' [-Werror=redundant-decls]
   50 | extern vm_paddr_t dump_avail[];
      |                   ^~~~~~~~~~
In file included from /usr/src/sys/kern/kern_malloc.c:81:
/usr/src/sys/vm/vm_page.h:592:19: note: previous declaration of 'dump_avail' was here
  592 | extern vm_paddr_t dump_avail[];
      |                   ^~~~~~~~~~

Do these routines need to be inlined? I can work on un-inlining them and submit a diff to fix things?