Page MenuHomeFreeBSD

x11/nvidia-driver: Fix too aggressive disabling of GSP firmware
Needs ReviewPublic

Authored by junchoon_dec.sakura.ne.jp on Mon, Apr 28, 3:23 PM.
Tags
None
Referenced Files
F116262773: D50053.id154407.diff
Sun, May 4, 12:40 PM
F116261887: D50053.id.diff
Sun, May 4, 12:24 PM
Unknown Object (File)
Sat, May 3, 3:11 AM
Unknown Object (File)
Fri, May 2, 7:55 AM
Unknown Object (File)
Thu, May 1, 3:42 AM
Unknown Object (File)
Thu, May 1, 3:39 AM
Unknown Object (File)
Thu, May 1, 3:19 AM
Unknown Object (File)
Wed, Apr 30, 9:28 AM
Subscribers
None

Details

Reviewers
kbowling
ashafer
Group Reviewers
x11
Summary

Currently, GSP on recent (Turing and later) nvidia GPU is disabled
as of suspend/resume issues (Differential Revision: D49828).

After D49828 lands, in Forums thread "Xorg won't start with officially
supported NVIDIA 5070 GPU?" [1], x11/nvidia-driver doesn't work but
building directly from upstream tarball (570.133 and 570.144) is
reported to work for the reporter.

After some testing on reporter's side with our request, reverting
back D49828 is reported to make x11/nvidia-driver working again.

This update changes the way disabling GSP firmware by default to allow overriding via tunable hw.nvidia.registry.EnableGpuFirmware.

[1] https://forums.freebsd.org/threads/xorg-wont-start-with-officially-supported-nvidia-5070-gpu.97659/

Reported by: foacs (displayed name on forums.freebsd.org)
Authored by: Tomoaki AOKI <junchoon@dec.sakura.ne.jp>

Diff Detail

Repository
R11 FreeBSD ports repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

I think a better idea would be to adjust the patch I made for disabling GSP. That patch was pretty heavy handed and turned off a global which enabled GSP. I think instead we could make the patch default only the sysctl value to zero, disabling GSP by default but allowing users to enable it by setting the tunable in loader.conf. That would prevent us from having to have a build option and prevent users from having to compile things themselves.

I think a better idea would be to adjust the patch I made for disabling GSP. That patch was pretty heavy handed and turned off a global which enabled GSP. I think instead we could make the patch default only the sysctl value to zero, disabling GSP by default but allowing users to enable it by setting the tunable in loader.conf. That would prevent us from having to have a build option and prevent users from having to compile things themselves.

I agree with this recommendation. This should be a tunable kind of thing since we need package users to be able to try both settings.

I think a better idea would be to adjust the patch I made for disabling GSP. That patch was pretty heavy handed and turned off a global which enabled GSP. I think instead we could make the patch default only the sysctl value to zero, disabling GSP by default but allowing users to enable it by setting the tunable in loader.conf. That would prevent us from having to have a build option and prevent users from having to compile things themselves.

I agree with this recommendation. This should be a tunable kind of thing since we need package users to be able to try both settings.

So OPTIONify is not needed, unless any case which need global disabling of GSP.
If such a case exists, keeping OPTIONifying but enable by default, with flipping the default for the tunable.
3 entries of OPTIONS_SINGLE would be wanted in that case.

Enable GSP firmware support
Disable GSP firmware support via tunable (default)

and

Globally disable GSP firmware support

to avoid confusions.

If not, simply changing patch to flip the tunable and edit pkg-message (actually, files/pkg-message.in) to tell anyone having problem with the ports default to flip it via tunable is sufficient.
In this case, if we can't edit summary of this revier, I need to abandon this and open a new review. Not sure at this moment.

I think a better idea would be to adjust the patch I made for disabling GSP. That patch was pretty heavy handed and turned off a global which enabled GSP. I think instead we could make the patch default only the sysctl value to zero, disabling GSP by default but allowing users to enable it by setting the tunable in loader.conf. That would prevent us from having to have a build option and prevent users from having to compile things themselves.

I agree with this recommendation. This should be a tunable kind of thing since we need package users to be able to try both settings.

So OPTIONify is not needed, unless any case which need global disabling of GSP.
If such a case exists, keeping OPTIONifying but enable by default, with flipping the default for the tunable.
3 entries of OPTIONS_SINGLE would be wanted in that case.

Enable GSP firmware support
Disable GSP firmware support via tunable (default)

and

Globally disable GSP firmware support

to avoid confusions.

If not, simply changing patch to flip the tunable and edit pkg-message (actually, files/pkg-message.in) to tell anyone having problem with the ports default to flip it via tunable is sufficient.
In this case, if we can't edit summary of this revier, I need to abandon this and open a new review. Not sure at this moment.

I don't think the option is worth carrying. Instead, editing extra-gsp-patch-src-nvidia_subr.c (or whatever file instead) to provide the correct tunable) is the proposal from @ashafer.

We will hope that the default can become durable and do the right thing almost all the time, but it still seems like tricky business with all the cards and driver right now.

I think a better idea would be to adjust the patch I made for disabling GSP. That patch was pretty heavy handed and turned off a global which enabled GSP. I think instead we could make the patch default only the sysctl value to zero, disabling GSP by default but allowing users to enable it by setting the tunable in loader.conf. That would prevent us from having to have a build option and prevent users from having to compile things themselves.

I agree with this recommendation. This should be a tunable kind of thing since we need package users to be able to try both settings.

So OPTIONify is not needed, unless any case which need global disabling of GSP.
If such a case exists, keeping OPTIONifying but enable by default, with flipping the default for the tunable.
3 entries of OPTIONS_SINGLE would be wanted in that case.

Enable GSP firmware support
Disable GSP firmware support via tunable (default)

and

Globally disable GSP firmware support

to avoid confusions.

If not, simply changing patch to flip the tunable and edit pkg-message (actually, files/pkg-message.in) to tell anyone having problem with the ports default to flip it via tunable is sufficient.
In this case, if we can't edit summary of this revier, I need to abandon this and open a new review. Not sure at this moment.

I don't think the option is worth carrying. Instead, editing extra-gsp-patch-src-nvidia_subr.c (or whatever file instead) to provide the correct tunable) is the proposal from @ashafer.

We will hope that the default can become durable and do the right thing almost all the time, but it still seems like tricky business with all the cards and driver right now.

Agreed. I'll drop OPTION GSP on next update. If I cannot edit summary of this review, would abandon this and open
new one.

I'm now looking for the initialization of the tunable, but it doesn't seem to be defined as straightforward as was in fbdev case. Maybe defined using combinations of multiple macros, NV_REG_STR_ENABLE_GPU_FIRMWARE and related ones.
So it would take time (days) to look into and find the correct place to patch in my spare times.

FYI, other registry tunables/sysctls used in nvidia.ko seems to be using the same (or looking alike) semantics, too.

junchoon_dec.sakura.ne.jp retitled this revision from x11/nvidia-driver: OPTIONify GSP support (disalbed by defailt) to x11/nvidia-driver: Fix too aggressive disabling of GSP firmware.
junchoon_dec.sakura.ne.jp edited the summary of this revision. (Show Details)

Re-title.
Fix to allow overriding hw.nvidia.registry.EnableGpuFirmware via /boot/loader.conf again, which stopped working after D49828 landed.

This should allow to workaround the case GPUs stopped working without GSP firmware support.

For the record:
Some findings on tracking macros handling registry hw.nvidia.registry.EnableGpuFirmware and the reason why I've chosen this way.

Registry hw.nvidia.registry.EnableGpuFirmware is initialized
in nvidia_tunable_init() implemented in src/nvidia/nvidia_os_registry.c
using TUNABLE_INT_FETCH().

Actual value is stored in one of the entry (27th) in global struct
nv_param_t nv_params[] in src/nvidia/nv_reg.c as follows.

NV_DEFINE_PARAMS_TABLE_ENTRY(__NV_ENABLE_GPU_FIRMWARE),

The above NV_DEFINE_PARAMS_TABLE_ENTRY() is defined in
src/nvidia/nv_reg.c as follows.

#define NV_DEFINE_PARAMS_TABLE_ENTRY(regkey) \

{ NV_REG_STRING(regkey), &__NV_REG_VAR(regkey) }

The above NV_REG_STRING() is defined in src/nvidia/nv_reg.c as follows.

#define NV_REG_STRING(regkey) #regkey
#define NV_REG_STRING(regkey)
NV_REG_STRING(regkey)

So the first item is stringified regkey.

And the above __NV_REG_VAR(regkey) is defined in src/nvidia/nv_reg.c
as follows.

#define __NV_REG_VAR(regkey) NVreg_##regkey

So regkey is concatenated after string "NVreg_".
And __NV_ENABLE_GPU_FIRMWARE is defined in src/nvidia/nv-reg.h
as follows.

#define __NV_ENABLE_GPU_FIRMWARE EnableGpuFirmware

Thus, the macro

NV_DEFINE_PARAMS_TABLE_ENTRY(__NV_ENABLE_GPU_FIRMWARE),

is now expanded as follows.

{ "EnableGpuFirmware", &NVreg_EnableGpuFirmware }

This entry is type struct nv_param_t defined in src/nvidia/nv.h
as follows.

typedef struct
{

char *name;
NvU32 *data;

} nv_parm_t;

So the entry means that its name is EnableGpuFirmware and
the value is stored in NvU32 type of variable NVreg_EnableGpuFirmware.

The value to be set is defined in src/common/inc/nv-firmware-registry,h
as follows.

#define NV_REG_STR_ENABLE_GPU_FIRMWARE "EnableGpuFirmware"

#define NV_REG_ENABLE_GPU_FIRMWARE_MODE_MASK 0x0000000F
#define NV_REG_ENABLE_GPU_FIRMWARE_MODE_DISABLED 0x00000000
#define NV_REG_ENABLE_GPU_FIRMWARE_MODE_ENABLED 0x00000001
#define NV_REG_ENABLE_GPU_FIRMWARE_MODE_DEFAULT 0x00000002

#define NV_REG_ENABLE_GPU_FIRMWARE_POLICY_MASK 0x000000F0
#define NV_REG_ENABLE_GPU_FIRMWARE_POLICY_ALLOW_FALLBACK 0x00000010

#define NV_REG_ENABLE_GPU_FIRMWARE_DEFAULT_VALUE 0x00000012

So modifying the last line as below defaulted registry
hw.nvidia.registry.EnableGpuFirmware to be 0.

#define NV_REG_ENABLE_GPU_FIRMWARE_DEFAULT_VALUE 0x00000000

Note that making the value to be 0x00000010 (allow fallback) caused
the registry to be non-zero (LSb still to be 0, though).
Maybe somehow auto-tuned with Lowest Significant Bit to be 0.
(Seen 0x18 or 0x16 set, but bit 3 and 4 are not defined here.)

Thanks for updating. Sorry I had it on my todo list to look up where to do this for you, but seems you've figured it out anyway.

This revision is now accepted and ready to land.Fri, May 2, 2:12 PM

Thanks!
Macro is good to ease and clarify codes "for anyone already thoroughy know it", but for anyone others, it can easily become kinda... hell to track. ;-) So I've left my previous comment for records.

The last thing I'm wondering about is the value for hw.nvidia.registry.EnableGpuFirmware to enable GSP in files/pkg-message.in.
I've noted 1 to force using GSP firmware, but possibly 17 (0x11) to allow fallback is better.

If you feel the value 1 for hw.nvidia.registry.EnableGpuFirmware is OK, please commit this.
(I cannot test whether 17 works for anyone affected or not, as I don't have GPUs with GSP.)

And also feel free to modify PORTREVISION on commit if this is NOT the first one that lands.

Rebase after D50048 landed as commit 13636d8b58f662e12d1513333d1e981a59620109.

This is rebase only update.

Recommended value for hw.nvidia.registry.EnableGpuFirmware to force GSP enabled in files/pkg-message.in is still 1 (theoretically-should-work value), as I myself can't test value 17 to allow fallback with the lack of affected GPU to test.

This revision now requires review to proceed.Sat, May 3, 11:41 PM