graphics/drm-next-kmod: Add hack to help make AMD drivers usable
AbandonedPublic
Actions

Authored by cem on Nov 24 2018, 11:13 PM.

Details

Reviewers

jmd
markj
mmacy
bdrewery

Summary

AMD drivers rely on TTM backpressure from the 'shrinker' subsystem to reduce
memory use under system pressure. Currently the only connection between the
pagedaemons and the linuxkpi shrinker is the vm_lowmem mechanism, which runs
far too late (it's a last resort mechanism, as opposed to the regular
PID-controller pagedaemons), far too infrequently (arbitrarily restricted to
0.1 Hz), and drops other more valuable caches at the same time.

Also, TTM defaults these pool sizes to half of RAM.

As a result, AMD drivers basically always wire half of your RAM and don't
really release it under pressure. Instead, other applications get swapped,
OOM killed, and/or their allocations fail; and the buf and UMA caches are
crunched to nothing.

In the long term, ideally we integrate the shrinker system better with the
pagedaemon system. But for now, give AMD GPU users a stop-gap measure to at
least get a usable desktop system.

Test Plan

Rebooting in a sec.

Diff Detail

Repository

rP FreeBSD ports repository

Lint

No Lint Coverage

Unit

No Test Coverage

Build Status

Buildable 21194
Build 20548: arc lint + arc unit

Event Timeline

cem created this revision.Nov 24 2018, 11:13 PM

Herald added a subscriber: mat. · View Herald TranscriptNov 24 2018, 11:13 PM

Harbormaster completed remote builds in B21194: Diff 51081.Nov 24 2018, 11:13 PM

it's a last resort mechanism, as opposed to the regular PID-controller pagedaemons

It should run regularly (with a 10s period) whenever the pagedaemon is attempting to reclaim pages.

Is there a specific application you're using to trigger this problem? I've never seen a large number of pages be wired by the TTM; maybe there's a GEM object leak? Are you using amdgpu or radeonkms?

Thanks for figuring this out!

We'd prefer to have the TTM patch directly in our github to forward carry it easily and also apply it to all existing branches and ports - could you make a PR there for the TTM change? If not, I'm happy to do that.

This revision now requires changes to proceed.Nov 24 2018, 11:35 PM

In D18327#389052, @markj wrote:

it's a last resort mechanism, as opposed to the regular PID-controller pagedaemons

It should run regularly (with a 10s period) whenever the pagedaemon is attempting to reclaim pages.

Is there a specific application you're using to trigger this problem? I've never seen a large number of pages be wired by the TTM; maybe there's a GEM object leak? Are you using amdgpu or radeonkms?

He's using radeonkms. However, @jmd has repeatedly run in to to the same issue on his systems with amdgpu.

In D18327#389052, @markj wrote:

it's a last resort mechanism, as opposed to the regular PID-controller pagedaemons

It should run regularly (with a 10s period) whenever the pagedaemon is attempting to reclaim pages.

Well, ok, but it's still neither frequent enough nor PID-controlled.

Is there a specific application you're using to trigger this problem?

Nope. The only programs I use that interact with graphics are the Xfce desktop (GTK), claws-mail (GTK), firefox, chrome, terminology (e17), and mpv.

I've never seen a large number of pages be wired by the TTM; maybe there's a GEM object leak?

It's unclear who allocated the pages. It'd be nice if we put TTM zones and GEM objects on a list that vmstat knew about like UMA zones and malloc. (It looks vaguely available in the kernel by walking the Linux kobject tree and looking for nodes with attributes lists including "used_memory." That's pretty ugly, though.) (Separately, it might also be nice to have a vmstat mode that grabbed all the data and sorted it.) Anyway, that's why I think it is probably the radeon driver — it does not show up in UMA or malloc memory use.

I don't know that it's TTM and not GEM; could be barking up the wrong tree. (It still seems dubious to give TTM free run of 16 GB of RAM, but if it doesn't fix the bug, it's moot.)

(Huh, now that I look at our Linuxkpi, we do build up a *sysctl* tree of Linux-equivalent sysfs nodes at sys.class? I wish I had known that while I still had the symptom. It also seems to be missing any TTM related node now, and I'm not exactly sure why.)

It wasn't a typical leak (in the sense that the growth halted at some point). It got to 14-16GB over a couple days, then hovered there. I guess it could also just be something allocating almost all of the memory off the local NUMA node (this system has two domains with 16GB on each). Except it didn't stay at exactly 16GB, it swung up and down slightly around 15GB.

Do you have any familiarity with drm / linuxkpi and suggestions for ways we could improve the visibility/integration of its memory management into FreeBSD? I would really like to be able to diagnose this better.

(Edit: one example might be, it'd be really nice if drm_debug was at least tunable instead of a compile-time constant. We don't support module_params so it ends up being constant at compile time, which sucks. On linux, it's runtime-adjustable. Edit2: Oh jeez we do put module_params in the sysctl tree, they're just under compat.linuxkpi and lack any driver-name prefixing.)

Are you using amdgpu or radeonkms?

This was radeonkms — it's a Juniper chip (Evergreen class), which is a pre-GCN model unsupported by amdgpu.

In D18327#389053, @jmd wrote:

Thanks for figuring this out!

We'd prefer to have the TTM patch directly in our github to forward carry it easily and also apply it to all existing branches and ports - could you make a PR there for the TTM change? If not, I'm happy to do that.

Yes, whatever we decide on can go to github eventually. Let's keep this open for discussion/investigation and when resolved, one of us can submit something to the GH repo.

Mark might be on to something. I went back and looked at my syslog from before the system totally hung and guess what? After a bunch of swap_pager_getswapspace spam (seriously, 62567/69256 log lines between Nov 19 and 24 are this message), and buried a little (it's not in the bottom 100 lines or so prior to the shutdown due to more swap_pager lines), we see:

Nov 24 01:51:00 n kernel: [drm:radeon_gem_object_create] Failed to allocate GEM object (4096, 2, 4096, -12)
Nov 24 01:51:00 n kernel: [TTM] Unable to allocate page

(Repeated a lot; same parameters every time.) That was about ~5 minutes prior to shutdown, so I assume it was the graphics hanging that finally forced me to reboot.

Domain 2 = "GTT", -12 = -ENOMEM, size/align of 4096 is mostly boring. We can tell it was the "ioctl" path (radeon_gem_create_ioctl) because the other paths pass a non-GTT domain type explicitly. Unfortunately that just means it's exposed in "ioctl" radeon_ioctl_kms vtable under RADEON_GEM_CREATE, which can be called from userspace via drm_ioctl. So we just let userspace leak kernel memory arbitrarily? That kinda sucks a lot.

cem planned changes to this revision.Nov 26 2018, 5:00 PM

This isn't the leak, and radeon crashed on me too many times for non-memleak reasons. Now happily using the nvidia binary driver with the 610 GT I had lying around :-).