Page MenuHomeFreeBSD

net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs
ClosedPublic

Authored by decui_microsoft.com on Dec 20 2016, 1:45 PM.
Tags
None
Referenced Files
F103122913: D8867.diff
Thu, Nov 21, 7:37 AM
F103122812: D8867.diff
Thu, Nov 21, 7:36 AM
Unknown Object (File)
Tue, Nov 19, 4:15 PM
Unknown Object (File)
Tue, Nov 19, 12:42 PM
Unknown Object (File)
Tue, Nov 19, 3:46 AM
Unknown Object (File)
Tue, Nov 19, 1:42 AM
Unknown Object (File)
Fri, Nov 8, 10:34 AM
Unknown Object (File)
Thu, Oct 24, 9:42 PM
Subscribers
None

Details

Summary

The patch is ported from
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7ae0e400cd9396c41fe596d35dcc34feaa89a04f

This is OK, because the driver in Linux side is dual-licensed (GPL & BSD).

Without the patch, when trying to enable SR-IOV for the Mellanox
ConnectX-3 VF with Hyper-V, I get the below error
mlx4_core0: Failed to initialize event queue table (err=-22), aborting.

Note: I didn't port the chunks beween
"@@ -2345,6 +2381,7 @@" and "@@ -2631,6 +2736,7 @@", because in FreeBSD
mlx4_load_one() doesn't exist.

However, it looks the current code in FreeBSD can work, at least I didn't
find any issue in my test.

Note: with this patch, actually the VF driver still can't work in my test:

mlx4_en: mlx4_core0: Port 1: failed reserving qp for TX ring
mlx4_en: mlx4_core0: Port 1: Failed to allocate NIC resources

Another patch will have to be ported too.

Diff Detail

Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 6329
Build 6568: arc lint + arc unit

Event Timeline

decui_microsoft.com retitled this revision from to net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs.
decui_microsoft.com updated this object.
decui_microsoft.com edited the test plan for this revision. (Show Details)

Hi,

The Mellanox FreeBSD team would like to review these changes before they hit the tree. I cannot say exactly when, but it might take a month or two.

--HPS

Hi,

The Mellanox FreeBSD team would like to review these changes before they hit the tree. I cannot say exactly when, but it might take a month or two.

--HPS

Sure. I understand.

I hope the Mellanox FreeBSD team can help to test the patches too, especially on bare metal.

To test the patches for Hyper-V SR-IOV, i.e. assigning Mellanox ConnectX-3 VF to FreeBSD VM, we need:

  1. The host must be Windows Server 2016;
  1. This link (https://community.mellanox.com/docs/DOC-2242) shows how to enable Mellanox ConnectX-3 VF for Windows VM running on Hyper-V 2012 R2. It's the similar with FreeBSD VM on 2016.
  1. The HEAD branch must be used with

https://reviews.freebsd.org/D8867
https://reviews.freebsd.org/D8868
https://reviews.freebsd.org/D8876
https://reviews.freebsd.org/D8909
https://reviews.freebsd.org/D8962
https://reviews.freebsd.org/D8963
https://reviews.freebsd.org/D8964
Or, use https://github.com/dcui/freebsd/commits/decui/master/sriov (the branch name is decui/master/sriov).

We may commit changes to Hyper-V synthetic NIC driver first, then we'll wait for Mellanox team's feedback about the changes to the mlx4 driver.

Found a bug in MLX4 VF driver:
in my Hyper-V SR-IOV VF test with MlX ConnectX-3 VF, the VF works fine when the VM has <=12 virtual CPUs, but if the VM has >=13 vCPUs, the VF driver fails to load:

mlx4_core0: <mlx4_core> at device 2.0 on pci1
mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
vmbus0: allocated type 3 (0xfe0800000-0xfe0ffffff) for rid 18 of mlx4_core0
mlx4_core0: Lazy allocation of 0x800000 bytes rid 0x18 type 3 at 0xfe0800000
mlx4_core0: Detected virtual function - running in slave mode
mlx4_core0: Sending reset
mlx4_core0: Sending vhcr0
mlx4_core0: HCA minimum page size:512
mlx4_core0: Timestamping is not supported in slave mode.
mlx4_core0: attempting to allocate 20 MSI-X vectors (52 supported)
mlx4_core0: using IRQs 256-275 for MSI-X
mlx4_core0: Failed to allocate mtts for 1024 pages(order 10)
mlx4_core0: Failed to initialize event queue table (err=-12), aborting.

I'll dig into this.
I would appreciate any help from Mellanox!

Hi,

mlx4_core0: Failed to initialize event queue table (err=-12), aborting.

-12 means ENOMEM

I possibly means we're out of memory somehow. Can you track this down.

BTW: I plan to merge your patch AS-IS.

Can we make the fix for the error a separate issue?

--HPS

Hi,

mlx4_core0: Failed to initialize event queue table (err=-12), aborting.

-12 means ENOMEM

I possibly means we're out of memory somehow. Can you track this down.

I assigned enough memory to the VM. :-)

Actually Linux version of the driver has the same (similar?) issue and I reported it here:
https://www.spinics.net/lists/netdev/msg420136.html

Our Redmond Linux team are working with Mellanox Linux team too. :-)
Some people suspected that it failed to allocate a UAR, but after we increased LOG_BAR_SIZE with mlxconfig from 3 (8MB) to 5 (32MB), the issue was still there.

We'll continue to work on this.

> BTW: I plan to merge your patch AS-IS.
Thanks for reviewing it!

Can we make the fix for the error a separate issue?

--HPS

Sure, I think so.

This revision is now accepted and ready to land.Feb 10 2017, 3:21 PM

@hselasky, thanks Mellanox very much for reviewing and testing the patch!