HomeFreeBSD

ibcore: Fix use-after-free in IB mad completion handling.

Description

ibcore: Fix use-after-free in IB mad completion handling.

We encountered a use-after-free bug when unloading the driver:

BUG: KASAN: use-after-free in ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
Read of size 4 at addr ffff8882ca5aa868 by task kworker/u13:2/23862

Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
Call Trace:
dump_stack+0x9a/0xeb
print_address_description+0xe3/0x2e0
ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
kasan_report+0x15c/0x1df
ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
kasan_report+0xe/0x20
ib_mad_post_receive_mads+0xddc/0xed0 [ib_core]
find_mad_agent+0xa00/0xa00 [ib_core]
qlist_free_all+0x51/0xb0
mlx4_ib_sqp_comp_worker+0x1970/0x1970 [mlx4_ib]
quarantine_reduce+0x1fa/0x270
kasan_unpoison_shadow+0x30/0x40
ib_mad_recv_done+0xdf6/0x3000 [ib_core]
_raw_spin_unlock_irqrestore+0x46/0x70
ib_mad_send_done+0x1810/0x1810 [ib_core]
mlx4_ib_destroy_cq+0x2a0/0x2a0 [mlx4_ib]
_raw_spin_unlock_irqrestore+0x46/0x70
debug_object_deactivate+0x2b9/0x4a0
ib_process_cq+0xe2/0x1d0 [ib_core]
ib_cq_poll_work+0x45/0xf0 [ib_core]
process_one_work+0x90c/0x1860
pwq_dec_nr_in_flight+0x320/0x320
worker_thread+0x87/0xbb0
kthread_parkme+0xb6/0x180
process_one_work+0x1860/0x1860
kthread+0x320/0x3e0
kthread_park+0x120/0x120
ret_from_fork+0x24/0x30
...
Freed by task 31682:
save_stack+0x19/0x80
kasan_slab_free+0x11d/0x160
kfree+0xf5/0x2f0
ib_mad_port_close+0x200/0x380 [ib_core]
ib_mad_remove_device+0xf0/0x230 [ib_core]
remove_client_context+0xa6/0xe0 [ib_core]
disable_device+0x14e/0x260 [ib_core]
ib_unregister_device+0x79/0x150 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
mlx4_ib_remove+0x162/0x690 [mlx4_ib]
mlx4_remove_device+0x204/0x2c0 [mlx4_core]
mlx4_unregister_interface+0x49/0x1d0 [mlx4_core]
mlx4_ib_cleanup+0xc/0x1d [mlx4_ib]
x64_sys_delete_module+0x2d2/0x400
do_syscall_64+0x95/0x470
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The problem was that the MAD PD was deallocated before the MAD CQ.
There was completion work pending for the CQ when the PD got deallocated.
When the mad completion handling reached procedure
ib_mad_post_receive_mads(), we got a use-after-free bug in the following
line of code in that procedure:
sg_list.lkey = qp_info->port_priv->pd->local_dma_lkey;
(the pd pointer in the above line is no longer valid, because the
pd has been deallocated).

We fix this by allocating the PD before the CQ in procedure
ib_mad_port_open(), and deallocating the PD after freeing the CQ
in procedure ib_mad_port_close().

Since the CQ completion work queue is flushed during ib_free_cq(),
no completions will be pending for that CQ when the PD is later
deallocated.

Note that freeing the CQ before deallocating the PD is the practice
in the ULPs.

Linux commit:
770b7d96cfff6a8bf6c9f261ba6f135dc9edf484

Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking

(cherry picked from commit 468a6b5055f0b6ea0bdb1ee8cbdf749204cb3b25)

Details

Provenance
hselaskyAuthored on Jun 16 2021, 1:01 PM
Parents
rG7da85a0db999: ibcore: Fail early if unsupported QP is provided.
Branches
Unknown
Tags
Unknown