Page MenuHomeFreeBSD

Use optimised complexity safe sort routine instead of the kernel's qsort
ClosedPublic

Authored by hselasky on May 20 2016, 1:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 2, 2:25 AM
Unknown Object (File)
Wed, Dec 25, 7:55 PM
Unknown Object (File)
Sat, Dec 14, 6:40 PM
Unknown Object (File)
Nov 17 2024, 7:02 PM
Unknown Object (File)
Oct 19 2024, 7:14 PM
Unknown Object (File)
Oct 4 2024, 8:30 AM
Unknown Object (File)
Sep 30 2024, 3:11 AM
Unknown Object (File)
Sep 30 2024, 2:21 AM
Subscribers

Details

Reviewers
gallatin
rrs
sepherosa_gmail.com
gnn
Group Reviewers
transport
Summary

The kernels qsort() routine can in worst case spend O(N*N) amount of comparisons before the result is sorted.

Because the sorting key is very small, 64-bits, we can use a bit-slice sorter algorithmn instead, which is faster than mergesort() and comparable() to qsort().

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

hselasky retitled this revision from to Use optimised complexity safe sort routine instead of the kernel's qsort.
hselasky updated this object.
hselasky edited the test plan for this revision. (Show Details)
hselasky set the repository for this revision to rS FreeBSD src repository - subversion.
hselasky edited edge metadata.

Implement suggestion from Drew:

  • Keep all data in same array to optimize cache line usage
  • Further optimize sorting function.
kbowling added inline comments.
sys/netinet/tcp_lro.c
384

Simple typo alorithm >> algorithm

hselasky edited edge metadata.

Fix spelling. Found by Kevin.

hselasky edited edge metadata.

Use ffsll() when it is provided by the CPU.

Suggested by Drew.

hselasky edited edge metadata.

Use flsll() instead of ffsll(). Else sorting result will be bit-reversed.

--HPS

hselasky edited edge metadata.

Optimize sorting algorithm.

gallatin edited edge metadata.

Great work -- faster AND safer

BTW, I tested a version of this patch in our Netflix code base. When serving roughly 80Gb/s with 80K TCP connections, the old method (qsort + tcp_lro_mbuf_compare_header) used 1.4% CPU, while the new (tcp_lro_sort) used 1.1% for LRO related sorting as measured by Intel Vtune. This test was done with a sysctl toggle to switch between qsort and the new sort.

That is why I mentioned that in addition to being safer (by limiting recursion), this is also faster.

rrs edited edge metadata.

If Drew says it works I am happy with it as well :-)

This revision is now accepted and ready to land.May 25 2016, 8:11 PM