Page MenuHomeFreeBSD

pci_iov: Stop preventing VFs from allocating extra bus numbers
Needs RevisionPublic

Authored by erj on Tue, Oct 8, 5:49 PM.

Details

Reviewers
jhb
rstone
Summary

If the removed check is in place, devices that need extra bus numbers
for their spawned VFs cannot spawn them (e.g. a dual port Intel 100G card),
and thus are limited in how many VFs that they support.

I removed this check and was able to spawn VFs that needed the extra
bus numbers, and nothing appeared to be wrong with the system
or the VF functionality, but I didn't do any extensive testing.

Signed-off-by: Eric Joyner <erj@freebsd.org>

Test Plan

Figure out what this change will affect. What could go wrong?

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 26931
Build 25235: arc lint + arc unit

Event Timeline

erj created this revision.Tue, Oct 8, 5:49 PM
erj edited the summary of this revision. (Show Details)Tue, Oct 8, 5:50 PM
erj edited the test plan for this revision. (Show Details)
erj added reviewers: jhb, rstone.
erj set the repository for this revision to rS FreeBSD src repository.
erj added a subscriber: Intel Networking.
jhb accepted this revision.Tue, Oct 8, 6:05 PM

How did this work? Meaning how did you allocate bus numbers? Did the parent PCI-PCI bridge have enough bus numbers in range already? For this to work properly you'd need to ensure that the parent PCI-PCI bridge at the other end of the link has the requested bus numbers mapped into its range of valid bus numbers.

This revision is now accepted and ready to land.Tue, Oct 8, 6:05 PM
erj added a comment.Tue, Oct 8, 6:29 PM
In D21944#479344, @jhb wrote:

How did this work? Meaning how did you allocate bus numbers? Did the parent PCI-PCI bridge have enough bus numbers in range already? For this to work properly you'd need to ensure that the parent PCI-PCI bridge at the other end of the link has the requested bus numbers mapped into its range of valid bus numbers.

I can show you the pciconf output; but to summarize the PFs are located at functions 0-7 (this card only has two functions so functions 2-7 are reserved and unused). VFs start at 8 with a stride of 1, so creating 256 VFs (the limit of the card) causes the last 8 on the 2nd port to need a new bus number.

Here's some output. ice4 is the first port:

ice4@pci0:134:0:0:      class=0x020000 card=0x00028086 chip=0x15928086 rev=0x01 hdr=0x00          
    vendor     = 'Intel Corporation'                                                              
    device     = 'Ethernet Controller E810-C for QSFP'                                            
    class      = network                                                                          
    subclass   = ethernet                                                                         
    cap 01[40] = powerspec 3  supports D0 D3  current D0                                          
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks                                     
    cap 11[70] = MSI-X supports 1024 messages, enabled                                            
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x8000]                                  
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO                                  
                 link x16(x16) speed 8.0(16.0)                                                    
    cap 03[e0] = VPD                                                                              
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected                                        
    ecap 000e[148] = ARI 1                                                                        
    ecap 0010[160] = SR-IOV 1 IOV enabled, Memory Space enabled, ARI enabled                      
                     128 VFs configured out of 128 supported                                      
                     First VF RID Offset 0x0008, VF RID Stride 0x0001                             
                     VF Device ID 0x1889                                                          
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304            
    iov bar  [184] = type Prefetchable Memory, range 64, base 0x397ff8000000, size 262144, enabled
    iov bar  [190] = type Prefetchable Memory, range 64, base 0x397ffa000000, size 16384, enabled 
    ecap 0017[1a0] = TPH Requester 1                                                              
    ecap 000d[1b0] = ACS 1                                                                        
    ecap 0019[1d0] = PCIe Sec 1 lane errors 0                                                     
    ecap 0025[200] = unknown 1                                                                    
    ecap 0026[210] = unknown 1                                                                    
    ecap 0027[250] = unknown 1

ice5 is the second port:

ice5@pci0:134:0:1:      class=0x020000 card=0x00028086 chip=0x15928086 rev=0x01 hdr=0x00          
    vendor     = 'Intel Corporation'                                                              
    device     = 'Ethernet Controller E810-C for QSFP'                                            
    class      = network                                                                          
    subclass   = ethernet                                                                         
    cap 01[40] = powerspec 3  supports D0 D3  current D0                                          
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks                                     
    cap 11[70] = MSI-X supports 1024 messages, enabled                                            
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x8000]                                  
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO                                  
                 link x16(x16) speed 8.0(16.0)                                                    
    cap 03[e0] = VPD                                                                              
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected                                        
    ecap 000e[148] = ARI 1                                                                        
    ecap 0010[160] = SR-IOV 1 IOV enabled, Memory Space enabled, ARI disabled                     
                     128 VFs configured out of 128 supported                                      
                     First VF RID Offset 0x0087, VF RID Stride 0x0001                             
                     VF Device ID 0x1889                                                          
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304            
    iov bar  [184] = type Prefetchable Memory, range 64, base 0x397ffc040000, size 262144, enabled
    iov bar  [190] = type Prefetchable Memory, range 64, base 0x397ffa200000, size 16384, enabled 
    ecap 0017[1a0] = TPH Requester 1                                                              
    ecap 000d[1b0] = ACS 1                                                                        
    ecap 0025[200] = unknown 1

iavf0 is the first VF:

iavf0@pci0:134:0:8:     class=0x020000 card=0x00008086 chip=0x18898086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'                                                    
    device     = 'Ethernet Adaptive Virtual Function'                                   
    class      = network                                                                
    subclass   = ethernet                                                               
    cap 11[70] = MSI-X supports 5 messages, enabled                                     
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]                        
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR                           
                 link x0(x16) speed 0.0(16.0)                                           
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected                              
    ecap 000e[148] = ARI 1                                                              
    ecap 0017[1a0] = TPH Requester 1                                                    
    ecap 000d[1d0] = ACS 1

With this patch, you can see that iavf248 is the first VF that gets a new bus number:

iavf248@pci0:135:0:0:   class=0x020000 card=0x00008086 chip=0x18898086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'                                                    
    device     = 'Ethernet Adaptive Virtual Function'                                   
    class      = network                                                                
    subclass   = ethernet                                                               
    cap 11[70] = MSI-X supports 5 messages, enabled                                     
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]                        
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR                           
                 link x0(x16) speed 0.0(16.0)                                           
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected                              
    ecap 000e[148] = ARI 1                                                              
    ecap 0017[1a0] = TPH Requester 1                                                    
    ecap 000d[1d0] = ACS 1

Which is used for all of the VFs up to iavf255, the last of the spawned 256 VFs from the device

iavf255@pci0:135:0:7:   class=0x020000 card=0x00008086 chip=0x18898086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'                                                    
    device     = 'Ethernet Adaptive Virtual Function'                                   
    class      = network                                                                
    subclass   = ethernet                                                               
    cap 11[70] = MSI-X supports 5 messages, enabled                                     
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]                        
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR                           
                 link x0(x16) speed 0.0(16.0)                                           
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected                              
    ecap 000e[148] = ARI 1                                                              
    ecap 0017[1a0] = TPH Requester 1                                                    
    ecap 000d[1d0] = ACS 1

As for all this how works, I don't know. I was hoping that you would tell me. :p

rstone added a comment.Tue, Oct 8, 6:49 PM

I suspect that means that your BIOS pre-allocated enough bus numbers to the bridge to account for the VFs. Perhaps we should add a pcib method that recursively checks that a contiguous range of bus numbers is assigned to the bridge?

jhb added a comment.Tue, Oct 8, 10:53 PM
In D21944#479348, @erj wrote:
In D21944#479344, @jhb wrote:

How did this work? Meaning how did you allocate bus numbers? Did the parent PCI-PCI bridge have enough bus numbers in range already? For this to work properly you'd need to ensure that the parent PCI-PCI bridge at the other end of the link has the requested bus numbers mapped into its range of valid bus numbers.

I can show you the pciconf output; but to summarize the PFs are located at functions 0-7 (this card only has two functions so functions 2-7 are reserved and unused). VFs start at 8 with a stride of 1, so creating 256 VFs (the limit of the card) causes the last 8 on the 2nd port to need a new bus number.

I would need the pciconf output of the parent bridge, not the PFs and VFs. Find the pcibX device that is the parent of ice4 (e.g. via devinfo) and then 'pciconf -lBb pcibX' output.

jhb requested changes to this revision.Tue, Oct 8, 10:55 PM

Sorry, not sure why it got marked as accepted earlier. As-is this is not safe without verifying that the parent bridge has the bus number allocated.

This revision now requires changes to proceed.Tue, Oct 8, 10:55 PM
erj added a comment.Tue, Oct 8, 11:32 PM

Here's the output I got from the first pcib above ice4/ice5 in devinfo:

pcib17@pci0:133:0:0:    class=0x060400 card=0x72708086 chip=0x20308086 rev=0x04 hdr=0x01        
    bus range  = 134-136                                                                        
    window[1c] = type I/O Port, range 16, addr 0xf000-0xfff, disabled                           
    window[20] = type Memory, range 32, addr 0xfff00000-0xfffff, disabled                       
    window[24] = type Prefetchable Memory, range 64, addr 0x397fe8000000-0x397ffc4fffff, enabled