mlx4_en_free_resources(): fix error handling for priv->stat_ctx
ClosedPublic

Authored by hselasky on Dec 21 2016, 2:55 AM.

Details

Summary

When trying to enable Mellanox ConnectX3 VF for Hyper-V,
I got the below error without https://reviews.freebsd.org/D8868

mlx4_en: mlx4_core0: Port 1: failed reserving qp for TX ring
mlx4_en: mlx4_core0: Port 1: Failed to allocate NIC resources

and the related error handling has a bug, causing a kernel crash:

Fatal trap 12: page fault while in kernel mode
...
Stopped at sysctl_ctx_free+0x7c: movq 0x8(%rax), %rax

The cause is:

In mlx4_en_init_netdev() -> mlx4_en_alloc_resources(),
if mlx4_en_create_tx_ring() fails, mlx4_en_sysctl_stat() won't be called;
later mlx4_en_init_netdev() -> mlx4_en_destroy_netdev() ->
mlx4_en_free_resources() -> sysctl_ctx_free(&priv->stat_ctx) will
dereference the NULL stat_ctx, because sysctl_ctx_init(&priv->stat_ctx)
hasn't been called.

To fix the issue, let's move the sysctl_ctx_init() to an earlier
place.
BTW, we also move the init for priv->conf_ctx for consistency of
coding style, though it's not a must.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
decui_microsoft.com retitled this revision from to mlx4_en_free_resources(): fix error handling for priv->stat_ctx.Dec 21 2016, 2:55 AM
decui_microsoft.com updated this object.
decui_microsoft.com edited the test plan for this revision. (Show Details)

It's probably better to call sysctl_ctx_init for the stat_ctx, when priv->sysctl is assigned. Or just moving the ctx_inits and sysctl assignment to common place. TAILQ requires explicit initialization, and we normally treat sysctl_ctx as opaque struct in drivers.

addressed sephe's comment by using a better method: let's move mlx4_en_sysctl_stat() to an earlier place.

decui_microsoft.com updated this object.

Let me update the patch, because the previous version causes a sysctl NULL pointer dereferencing issue: we must make sure mlx4_en_sysctl_stat() is called after mlx4_en_create_rx/tx_ring(), otherwise the stat sysctl nodes can't reference the correct ring pointers.

@hselasky, can you please review this patch?

I'll have a look. Please give me some time (1 week).

Thanks! No hurry. This is is only a trivial fix to improve error handling.
We don't have a strong dependency on it at all. Just to make sure I don't forget this patch. :-)

I'm currently working on this patch. Sorry for the delay.

hselasky commandeered this revision.Apr 27 2017, 1:15 PM
hselasky edited reviewers, added: decui_microsoft.com; removed: hselasky.
hselasky updated this revision to Diff 27788.Apr 27 2017, 2:31 PM

Slight refactor of the patch.

This revision was automatically updated to reflect the committed changes.