Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Internally at Netflix, we have a version of this patch with a slightly different name. This is because, in addition to avoiding code duplication, it also allows a bit more room to innovate with the LRO internals.
Is there any chance you could call this tcp_lro_flush_all() rather than tcp_lro_rx_done()? The "flush" name follows the existing convention, (And, selfishly, it will avoid merge conflicts when it trickles into our tree, as we called it tcp_lro_flush_all())
Yeah, in addition to the deduplicate the code, I have worked on improving the internal of the LRO; my first step is change the SLIST to LIST, so that the medium sized (128 used by hyperv's network driver) LRO entries will not slow things down when an entry is removed from the end of the list :). Testing of this change will be posted shortly, after I test it in Azure. And even more, I will add a hash table, so the tcp_lro_rx() don't need to iterate the active list.
Is there any chance you could call this tcp_lro_flush_all() rather than tcp_lro_rx_done()? The "flush" name follows the existing convention, (And, selfishly, it will avoid merge conflicts when it trickles into our tree, as we called it tcp_lro_flush_all())
Well, the tcp_lro_flush_all() was taken by the mbuf queuing, that's why I use tcp_lro_rx_done() here. But, we can rename tcp_lro_flush_all() to tcp_lro_flush_mbuf(), and then rename tcp_lro_rx_done() to tcp_lro_flush_all(). Will you and Hans agree upon this approach?
I'd forgotten that was what hps had named his flusher. But, the cool thing is that the existing tcp_lro_flush_all() does exactly what you want. All you need to do is get hps' approval to remove the __predict_false() from the check at the top of tcp_lro_flush_all().
As to hash tables and such..
We (netflix) already have a hash table. Randall (rrs@) has been working on this for the last year or so, on and off. He's trying to tell if it is faster or slower than HPS's RSS sorted method.
The advantage of the hash is that all drivers would benefit. The disadvantage is that it is arguably less cache friendly. We're in the middle of a comparison. Let's collaborate before you end up duplicating all our work. I'll send you our current code off-line.
Please also add rrs to this review.
Grrr, I didn't read the code closely. I will remove the predict false and call the flush_all in drivers. Thanks.
As to hash tables and such..
We (netflix) already have a hash table. Randall (rrs@) has been working on this for the last year or so, on and off. He's trying to tell if it is faster or slower than HPS's RSS sorted method.
The advantage of the hash is that all drivers would benefit. The disadvantage is that it is arguably less cache friendly. We're in the middle of a comparison. Let's collaborate before you end up duplicating all our work. I'll send you our current code off-line.
Yeah, sure, I could test the patch in the Azure.
Please also add rrs to this review.
Done.
Use tcp_lro_flush_all and remove the predict_false at the beginning of the tcp_lro_flush_all. Drivers now call tcp_lro_flush_all() and staticize the tcp_lro_rx_done(). As suggested by gallatin.
for these two:
M sys/dev/cxgb/cxgb_sge.c (6 lines)
M sys/dev/cxgbe/t4_sge.c (7 lines)
The call to tcp_lro_flush_all is not equivalent to the code it's replacing. The other LRO receive paths in these drivers do not do the qsort before tcp_lro_rx. Neither should the final flush.
By the way, the names of the routines are a bit awkward. tcp_lro_flush_all sounds like something that would be a simple loop around tcp_lro_flush but isn't. That threw me off initially and I was about to accept this revision.
Well, tcp_lro_flush_all == tcp_lro_rx_done == "iterate the lro active list and call tcp_lro_flush", if the mbuf qsort is _not_ used. I thought the tcp_lro_flush_all was just for mbuf qsort as you do, but as gallatin@ pointed out: if mbuf qsort is not used, it just flush the lro active list.
By the way, the names of the routines are a bit awkward. tcp_lro_flush_all sounds like something that would be a simple loop around tcp_lro_flush but isn't. That threw me off initially and I was about to accept this revision.
See the above comment.
One unrelated comment: why doesn't tcp_lro_queue_mbuf() do a flush_all check right after adding the mbuf to lro_mbuf_data? Doesn't seem to be much point waiting for the next enqueue to flush a list that's already full. Basically move the flush_all to the end, like this:
/* store sequence number */
TCP_LRO_SEQUENCE(mb) = lc->lro_mbuf_count;
/* enter mbuf */
lc->lro_mbuf_data[lc->lro_mbuf_count++] = mb;
/* check if array is full */
if (__predict_false(lc->lro_mbuf_count == lc->lro_mbuf_max))
tcp_lro_flush_all(lc);
This probably shouldn't have been committed without a gentle prod towards the Intel Networking group.
Was this tested on Intel hardware at all? if so, what model cards?
Sean:
While I agree a prod to Networking might have been in order, I think its a general
*good* idea here. I don't think having drivers know the internal make-up of the
tcp-lro code is good. I have something similar running at NF for quite some time
now...
I very much doubt that any driver that was doing the SLIST thing
and removing that to make it call the function that does the "SLIST" thing
is going to have a problem...
IMO this really cleans up one of I am sure a number of little driver-dup over and over
again.. :-)
If you guys popped this onto machines with Intel networking cards, then I'm fine with it.