Page MenuHomeFreeBSD

Fix ixl driver performance issue
Needs RevisionPublic

Authored by alexandre.martins_stormshield.eu on Jan 19 2018, 9:12 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 6, 4:01 AM
Unknown Object (File)
Sat, Oct 5, 8:04 AM
Unknown Object (File)
Tue, Oct 1, 10:15 AM
Unknown Object (File)
Sun, Sep 29, 7:47 AM
Unknown Object (File)
Thu, Sep 26, 3:40 PM
Unknown Object (File)
Tue, Sep 24, 4:34 PM
Unknown Object (File)
Tue, Sep 24, 3:02 AM
Unknown Object (File)
Sat, Sep 21, 8:09 PM
Subscribers

Details

Summary

The receive function ixl_rxeof doesn't return TRUE when more packets can be fetched.

I also fixed the flowid that is incorrect because the msix value is one based and flowid is still zero based.

The performances increase in about 300% in my case.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

erj added a reviewer: krzysztof.galazka_intel.com.

These changes look good to me, but I'm going to add a co-worker to get his opinion, too. We may want to delay committing this until we get our internal updated version of ixl(4) committed.

This revision is now accepted and ready to land.Feb 7 2018, 11:01 PM

Hi Alexandre!

Could you tell me which NIC have you used and how you measured the performance? I'm a little concerned about impact on latency in case when there is constant incoming traffic. I think It might be good idea to add some sysctl/tunable to limit how many times the task is rescheduled before re-enabling IRQ. I'll talk with our validation team about running some test.

Thanks,
Chris

erj requested changes to this revision.Feb 9 2018, 7:57 PM

Until Chris's questions get answered.

This revision now requires changes to proceed.Feb 9 2018, 7:57 PM

Hi Krzysztof,

I'm using mainly two adapters. The first and principal one is an XL710 AM1/AM2 the second is an X722.

Our network stack is configured to use polling instead of interrupts, that's why the performance gap is huge for us (because the polling task was never rescheduled).

I didn't test the latency but the throughput.

If you need to test some patches, my platform is yours.

As I look at this more, I don't think that last section of the patch is right. ixl_rxeof() makes that same DD bit check in the for-loop; it's supposed to have processed as many descriptors as it can by the time it hits the end of the function.

But, it looks like it could have more descriptors that it could process due to the number for-loop iterations being bounded by the "count" parameter that's passed into the function. Maybe that's what you need to change instead? Raise that from the default?

In fact, my patch was inspired by the ixgbe driver that does the same check:

https://svnweb.freebsd.org/base/stable/10/sys/dev/ixgbe/ix_txrx.c?view=annotate#l1928

I don't know how the adapter works, so, my fix may be probably wrong.

You mentioned polling in a previous comment -- did you make changes to the kernel to enable that? I'm assuming you've also made changes to the driver as well, since we don't have polling support in the driver.

Are you also using stable/10 for this?

What appears like a bug is that return code for rxeof is checked to start the taskqueue.
If it is an optimization not to call the taskqueue it is at least better to do it in the callers ?
The other point is why coding the taskqueue if it is never scheduled (except for more_tx maybe) ?

more = ixl_rxeof(que, IXL_RX_LIMIT);

1427 IXL_TX_LOCK(txr);
1428 ixl_txeof(que);
1429 if (!drbr_empty(ifp, txr->br))
1430 ixl_mq_start_locked(ifp, txr);
1431 IXL_TX_UNLOCK(txr);
1432 if (more) {
1433 taskqueue_enqueue(que->tq, &que->task);
1434 return;
1435 }