Page MenuHomeFreeBSD

linuxkpi: work with numpages > 1 in the set_pages_*() KPIs
ClosedPublic

Authored by kevans on Thu, May 14, 8:07 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, May 22, 8:03 AM
Unknown Object (File)
Fri, May 22, 4:15 AM
Unknown Object (File)
Fri, May 22, 3:52 AM
Unknown Object (File)
Thu, May 21, 11:22 PM
Unknown Object (File)
Thu, May 21, 10:59 PM
Unknown Object (File)
Thu, May 21, 7:16 PM
Unknown Object (File)
Thu, May 21, 6:46 PM
Unknown Object (File)
Wed, May 20, 7:43 AM
Subscribers

Details

Summary

These calls are used for buddy pages at least in drm's ttm_pool, which
leads to a panic when we invoke lowmem handlers and drm tries to shrink
the pool.

Cope with numpages > 1 by using page_to_virt() and the corresponding
set_memory_*() call for each. This also seems to be more correct, as
pmap_page_set_memattr() will only affect the DMAP mapping while the
new version updates both the DMAP and non-DMAP KVA that drm is more
likely using.

This stabilized my amdgpu laptop running two VMs, chromium and a
concurrent buildworld.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Am I correct in what you are describing is that if you had a debug kernel, you would hit the KASSERTs?

Is there a bug report for this in bugzilla or on the drm-kmod repository?

This change is interesting, as looking at set_pages_wb() this is how it once was implemented in drm-next 95f61a6ae0be (logic wise not arguments; apparently pmap_change_attr took a numpages argument back then still). I wonder when the implementation changed; it's a bit hard to track. Ah easier than I thought in that case (no file movements involved):

drm-next 0138c6751910 by @markj (link: https://github.com/FreeBSDDesktop/DEPRECATED-freebsd-base-graphics/commit/0138c6751910602bb6ffa9ab24448956e9f09736 )

commit 0138c6751910602bb6ffa9ab24448956e9f09736
Author:     Mark Johnston <markjdb@gmail.com>
AuthorDate: Sun Feb 26 00:52:34 2017 -0800
Commit:     Mark Johnston <markjdb@gmail.com>
CommitDate: Sun Feb 26 01:07:36 2017 -0800

    Fix set_pages_*
    
    The previous implementation was incorrect:
    - it only updated the kernel pmap
    - it passed the physical address of the page as the address at which the
      page is mapped
In D57004#1306284, @bz wrote:

Am I correct in what you are describing is that if you had a debug kernel, you would hit the KASSERTs?

Yes, I run debug kernels on my laptop and have hit this KASSERT every day, multiple times a day, since I first got amdgpu working on it just a few days ago.

Is there a bug report for this in bugzilla or on the drm-kmod repository?

No, the only report I'm aware of is mine from https://reviews.freebsd.org/D56971

This change is interesting, as looking at set_pages_wb() this is how it once was implemented in drm-next 95f61a6ae0be (logic wise not arguments; apparently pmap_change_attr took a numpages argument back then still). I wonder when the implementation changed; it's a bit hard to track. Ah easier than I thought in that case (no file movements involved):

Interesting- I wasn't aware of any other implementation here.

drm-next 0138c6751910 by @markj (link: https://github.com/FreeBSDDesktop/DEPRECATED-freebsd-base-graphics/commit/0138c6751910602bb6ffa9ab24448956e9f09736 )

commit 0138c6751910602bb6ffa9ab24448956e9f09736
Author:     Mark Johnston <markjdb@gmail.com>
AuthorDate: Sun Feb 26 00:52:34 2017 -0800
Commit:     Mark Johnston <markjdb@gmail.com>
CommitDate: Sun Feb 26 01:07:36 2017 -0800

    Fix set_pages_*
    
    The previous implementation was incorrect:
    - it only updated the kernel pmap
    - it passed the physical address of the page as the address at which the
      page is mapped

DMAP and non-DMAP KVA that drm is more likely using.

Which KVA do you mean? Can you share the stack trace from the problematic invocation?

You're right that if the pages are mapped into the kernel map that we'll now update both the kernel map and the direct map mappings of the page, but now we don't set m->md.pat_mode, so future userspace mappings might use the wrong attributes.

DMAP and non-DMAP KVA that drm is more likely using.

Which KVA do you mean? Can you share the stack trace from the problematic invocation?

I'm realizing that I followed the wrong branch of linux_alloc_pages and missed what the conditional was, so this didn't actually make any sense.

(kgdb) bt
#0  __curthread () at /usr/src.kbsd/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0) at /usr/src.kbsd/sys/kern/kern_shutdown.c:399
#2  0xffffffff804bd13a in db_dump (dummy=<optimized out>, dummy2=<optimized out>,
    dummy3=<optimized out>, dummy4=<optimized out>)
    at /usr/src.kbsd/sys/ddb/db_command.c:596
#3  0xffffffff804bcfea in db_command (last_cmdp=<optimized out>,
    cmd_table=<optimized out>, dopager=false) at /usr/src.kbsd/sys/ddb/db_command.c:508
#4  0xffffffff804bd076 in db_command_script (
    command=command@entry=0xffffffff81bd774a <db_recursion_data+58> "dump")
    at /usr/src.kbsd/sys/ddb/db_command.c:573
#5  0xffffffff804c2e48 in db_script_exec (
    scriptname=scriptname@entry=0xfffffe00fe91a4e0 "kdb.enter.panic",
    warnifnotfound=warnifnotfound@entry=0) at /usr/src.kbsd/sys/ddb/db_script.c:301
#6  0xffffffff804c2d42 in db_script_kdbenter (eventname=<optimized out>)
    at /usr/src.kbsd/sys/ddb/db_script.c:323
#7  0xffffffff804c05c1 in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/src.kbsd/sys/ddb/db_main.c:266
#8  0xffffffff80c2efff in kdb_trap (type=type@entry=3, code=code@entry=0,
    tf=tf@entry=0xfffffe00fe91a830) at /usr/src.kbsd/sys/kern/subr_kdb.c:790
#9  0xffffffff81140774 in trap (frame=<optimized out>)
    at /usr/src.kbsd/sys/amd64/amd64/trap.c:697
#10 <signal handler called>
#11 kdb_enter (why=<optimized out>, msg=<optimized out>)
    at /usr/src.kbsd/sys/kern/subr_kdb.c:556
#12 0xffffffff80bdb7c9 in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe00fe91aa60)
    at /usr/src.kbsd/sys/kern/kern_shutdown.c:962
#13 0xffffffff80bdb643 in panic (
    fmt=0xffffffff81da2320 <cnputs_mtx> "R\345\"\201\377\377\377\377")
    at /usr/src.kbsd/sys/kern/kern_shutdown.c:887
#14 0xffffffff852d036f in ttm_pool_init (pool=0x1, dev=0x200, nid=512,
    use_dma_alloc=false, use_dma32=160)
    at /usr/local/sys/modules/drm-kmod/drivers/gpu/drm/ttm/ttm_pool.c:660
#15 0xffffffff852d0fe8 in ttm_pool_mgr_fini ()
    at /usr/local/sys/modules/drm-kmod/drivers/gpu/drm/ttm/ttm_pool.c:939
#16 0xffffffff80eaefdd in shrinker_shrink (s=0xffffffff852d036f <ttm_pool_init+127>)
    at /usr/src.kbsd/sys/compat/linuxkpi/common/src/linux_shrinker.c:115
#17 linuxkpi_vm_lowmem (arg=<optimized out>, flags=<optimized out>)
    at /usr/src.kbsd/sys/compat/linuxkpi/common/src/linux_shrinker.c:129
--Type <RET> for more, q to quit, c to continue without paging--c
#18 0xffffffff80fdec16 in vm_pageout_lowmem () at /usr/src.kbsd/sys/vm/vm_pageout.c:2092
#19 vm_pageout_worker (arg=arg@entry=0x0) at /usr/src.kbsd/sys/vm/vm_pageout.c:2190
#20 0xffffffff80fde837 in vm_pageout () at /usr/src.kbsd/sys/vm/vm_pageout.c:2427
#21 0xffffffff80b8aca2 in fork_exit (callout=0xffffffff80fde660 <vm_pageout>, arg=0x0,
    frame=0xfffffe00fe91af40) at /usr/src.kbsd/sys/kern/kern_fork.c:1201

Er, sorry, hold on- the trace doesn't make any sense because I re-rolled the kernel since that one. Will post a new one in a few hours.

(kgdb) bt
#0  __curthread () at /usr/src.kbsd/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0) at /usr/src.kbsd/sys/kern/kern_shutdown.c:399
#2  0xffffffff804bd13a in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src.kbsd/sys/ddb/db_command.c:596
#3  0xffffffff804bcfea in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=false) at /usr/src.kbsd/sys/ddb/db_command.c:508
#4  0xffffffff804bd076 in db_command_script (command=command@entry=0xffffffff81bd774a <db_recursion_data+58> "dump") at /usr/src.kbsd/sys/ddb/db_command.c:573
#5  0xffffffff804c2e48 in db_script_exec (scriptname=scriptname@entry=0xfffffe007713b4e0 "kdb.enter.panic", warnifnotfound=warnifnotfound@entry=0) at /usr/src.kbsd/sys/ddb/db_script.c:301
#6  0xffffffff804c2d42 in db_script_kdbenter (eventname=<optimized out>) at /usr/src.kbsd/sys/ddb/db_script.c:323
#7  0xffffffff804c05c1 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src.kbsd/sys/ddb/db_main.c:266
#8  0xffffffff80c2efff in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe007713b830) at /usr/src.kbsd/sys/kern/subr_kdb.c:790
#9  0xffffffff81140774 in trap (frame=<optimized out>) at /usr/src.kbsd/sys/amd64/amd64/trap.c:697
#10 <signal handler called>
#11 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src.kbsd/sys/kern/subr_kdb.c:556
#12 0xffffffff80bdb7c9 in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe007713ba60) at /usr/src.kbsd/sys/kern/kern_shutdown.c:962
#13 0xffffffff80bdb643 in panic (fmt=0xffffffff81da2320 <cnputs_mtx> "R\345\"\201\377\377\377\377") at /usr/src.kbsd/sys/kern/kern_shutdown.c:887
#14 0xffffffff852d036f in set_pages_wb (vm_page=<optimized out>, numpages=-2126791527) at /usr/src.kbsd/sys/compat/linuxkpi/common/include/asm/set_memory.h:90
#15 ttm_pool_free_page (pool=0x0, order=3, p=<optimized out>, caching=<optimized out>) at /usr/local/sys/modules/drm-kmod/drivers/gpu/drm/ttm/ttm_pool.c:162
#16 ttm_pool_shrink () at /usr/local/sys/modules/drm-kmod/drivers/gpu/drm/ttm/ttm_pool.c:385
#17 0xffffffff852d0fe8 in ttm_pool_shrinker_scan (shrink=<optimized out>, sc=0xfffffe007713baf0) at /usr/local/sys/modules/drm-kmod/drivers/gpu/drm/ttm/ttm_pool.c:726
#18 0xffffffff80eaefdd in shrinker_shrink (s=0xfffff8002035e340) at /usr/src.kbsd/sys/compat/linuxkpi/common/src/linux_shrinker.c:115
#19 linuxkpi_vm_lowmem (arg=<optimized out>, flags=<optimized out>) at /usr/src.kbsd/sys/compat/linuxkpi/common/src/linux_shrinker.c:129
#20 0xffffffff80fdec16 in vm_pageout_lowmem () at /usr/src.kbsd/sys/vm/vm_pageout.c:2092
#21 vm_pageout_worker (arg=arg@entry=0x0) at /usr/src.kbsd/sys/vm/vm_pageout.c:2190
#22 0xffffffff80fde837 in vm_pageout () at /usr/src.kbsd/sys/vm/vm_pageout.c:2427
#23 0xffffffff80b8aca2 in fork_exit (callout=0xffffffff80fde660 <vm_pageout>, arg=0x0, frame=0xfffffe007713bf40) at /usr/src.kbsd/sys/kern/kern_fork.c:1201
#24 <signal handler called>
#25 0x2e616c65722e006f in ?? ()
Backtrace stopped: Cannot access memory at address 0x6d79736e79642e00

I have a suggestion from markj staged to walk the contiguous pages instead and continue using pmap_page_set_memattr since the m->md.pat_mode assignment is likely critical, and I'll try that tomorrow morning.

#14 0xffffffff852d036f in set_pages_wb (vm_page=<optimized out>, numpages=-2126791527) at /usr/src.kbsd/sys/compat/linuxkpi/common/include/asm/set_memory.h:90

I have a suggestion from markj staged to walk the contiguous pages instead and continue using pmap_page_set_memattr since the m->md.pat_mode assignment is likely critical, and I'll try that tomorrow morning.

I could be wrong but that numpages looks like a pointer? 0xffffffff813bbc99
With order == 3, it should be 8. Can you check the dump. I only had 3 hours of sleep so excuse anything I am missing.

Which drm-kmod version are you running (and compiled/installed from where)?

In D57004#1306346, @bz wrote:
#14 0xffffffff852d036f in set_pages_wb (vm_page=<optimized out>, numpages=-2126791527) at /usr/src.kbsd/sys/compat/linuxkpi/common/include/asm/set_memory.h:90

I have a suggestion from markj staged to walk the contiguous pages instead and continue using pmap_page_set_memattr since the m->md.pat_mode assignment is likely critical, and I'll try that tomorrow morning.

I could be wrong but that numpages looks like a pointer? 0xffffffff813bbc99
With order == 3, it should be 8. Can you check the dump. I only had 3 hours of sleep so excuse anything I am missing.

Which drm-kmod version are you running (and compiled/installed from where)?

Right, you can't really trust argument values in kgdb stack trace for reasons that probably make sense to someone else. The panic message that I omitted is

panic: set_pages_wb: numpages 8

This is 6.12-lts that I build tied to my kernels with LOCAL_MODULES, but I don't see any obvious reason in the diff to 6.6-lts that it wouldn't have occurred there also.

Mark's suggestion (which I interpreted as https://paste.fbsd.dev/5k6n) seems to work as well; I'm inserting an SDT probe now to confirm my test scenario is actually hitting it right and will stress-test it more.

The problem I have with all of these things is that I don't really know what the failure mode might look like if these attrs aren't adjusted correctly here -- I'm not naive enough to call not-panicking during a lowmem event a solid win since I'm removing the assertion that I hit, but I haven't yet spent enough time understanding where the page(s) are ending up after this (returned to the pool or actually freed) to try and reason about what plausible victims might look like.

Mark's suggestion (which I interpreted as https://paste.fbsd.dev/5k6n) seems to work as well; I'm inserting an SDT probe now to confirm my test scenario is actually hitting it right and will stress-test it more.

The problem I have with all of these things is that I don't really know what the failure mode might look like if these attrs aren't adjusted correctly here -- I'm not naive enough to call not-panicking during a lowmem event a solid win since I'm removing the assertion that I hit, but I haven't yet spent enough time understanding where the page(s) are ending up after this (returned to the pool or actually freed) to try and reason about what plausible victims might look like.

Given set_pages_attr() is an internal function not part of the Linux KPI it should get an lkpi_ prefix. Can you update the review here with the change?

Update to the version that I believe incorporates suggestions and discussion
with Mark. The commit message will also be amended to caution against the same
trap that I fell into:

linuxkpi: work with numpages > 1 in the set_pages_*() KPIs

These calls are used for buddy pages at least in drm's ttm_pool, which
leads to a panic when we invoke lowmem handlers and drm tries to shrink
the pool.

Cope with numpages > 1 by traversing the contiguous pages and executing
the adjustment there, as well, as suggested by markj@.  Previous
versions have tried to use the corresponding `set_memory_*()` functions,
but it is believed that not updating `md.pat_mode` breaks subsequent
userspace mappings in ways that may result in things like screen tearing
or other artifacts when running i915kms.

This stabilized my amdgpu laptop running two VMs, chromium and a
concurrent buildworld.

Differential Revision:  https://reviews.freebsd.org/D57004

SDT probes show that it's getting hit frequently with numpages>1 when I wire
enough memory into one of the VMs, and I'm not observing any unusual behavior
from the system for whatever that's worth. Naturally, I will continue to run
the patch and this laptop is my daily driver for both work and fun, so I'll be
quick to scream if something weird happens.

I am fine with the logic, if @markj is as well. He understands the comments and the need for the MPASS better than I do.

Please make sure the commit message is updated and appropriate on the commit (I know updating them in Phab is a manual operation sadly).

sys/compat/linuxkpi/common/include/asm/set_memory.h
32

Why do you need sys/param.h? If it's just for MPASS, you can use sys/kassert.h instead.

68

I'd put this in a C file to avoid pulling more FreeBSD identifiers into the namespace.

sys/compat/linuxkpi/common/include/asm/set_memory.h
32

I'll test again, but my recollection is that it ends up being a prereq for VM headers (u_* types, specifically, iirc)

Reduce introduced pollution by moving new KPI to .c, as requested.

This revision is now accepted and ready to land.Tue, May 19, 12:25 AM

FWIW- I did also try dropping the KASSERT without this change and forcing a boatload of reclamation events with low-ish physmem available to see if I could document possible misbehavior one might have observed without INVARIANTS, but I never encountered any sort of system instability, artifacts/tearing, or anything unusual at all.

sys/compat/linuxkpi/common/include/asm/set_memory.h
32

For posterity: I was infact wrong here and it must have been the MPASS. I thought I remembered needing sys/types.h for VM headers, and infact maybe I did somewhere in drm-kmod that includes this- I decided it wasn't worth hunting further since they were going away anyways.