Page MenuHomeFreeBSD

libc: Implement bsort(3) bitonic sorting algorithm.
Needs ReviewPublic

Authored by hselasky on Sep 8 2022, 10:21 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Jan 17, 10:42 AM
Unknown Object (File)
Sat, Jan 7, 11:33 AM
Unknown Object (File)
Sat, Jan 7, 10:25 AM
Unknown Object (File)
Dec 15 2022, 7:49 PM
Unknown Object (File)
Dec 14 2022, 10:35 PM
Unknown Object (File)
Nov 28 2022, 1:48 PM
Unknown Object (File)
Nov 28 2022, 1:47 PM
Unknown Object (File)
Nov 26 2022, 7:49 PM

Details

Reviewers
pauamma
gbe
delphij
Group Reviewers
manpages
Summary

Implement bsort(3) as a drop in replacement for qsort(3).

bsort(3) does not suffer from the same processing time issues that qsort(3) does, and can be CPU accelerated in the future.

Test Plan
# apply patch and
make -C lib/libc
cp -i include/stdlib.h /usr/include/

# mergesort(3) 4096 entries
cc -Wall /usr/obj/usr/img/freebsd.git/amd64.amd64/lib/libc/libc.a -O2 testsort.c && time ./a.out 12 3 2

# bsort(3) 4096 entries
cc -Wall /usr/obj/usr/img/freebsd.git/amd64.amd64/lib/libc/libc.a -O2 testsort.c && time ./a.out 12 2 2

# qsort(3) 4096 entries
cc -Wall /usr/obj/usr/img/freebsd.git/amd64.amd64/lib/libc/libc.a -O2 testsort.c && time ./a.out 12 1 2

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

hselasky edited the test plan for this revision. (Show Details)

Test program v1 testsort.c{F47959045}

# This C-program shows how qsort() affects ls's performance.

rm -rf /tmp/kill_ls
cc -O2 kill_ls.c && ./a.out
time ls -f /tmp/kill_ls/
time ls /tmp/kill_ls/

Fix spelling in manual page (garyj@gmx.de)

s/is/are/ since elements is plural.

Patch for https://github.com/izabera/qsortbench.git

leaderboard
 1. illumos       10.782269     1.000000
 2. mine          11.456753     1.062555
 3. system        17.338236     1.608032
 4. glibc         11.094431     1.028951
 5. freebsd       14.988042     1.390064
 6. plan9         16.398762     1.520901
 7. bsd           16.851684     1.562907
 8. wada          18.539832     1.719474
 9. musl          45.581364     4.227437
10. linux         32.343515     2.999695
11. diet          24.761722     2.296522
12. mini          75.945953     7.043597
13. uclibc        54.332577     5.039067
14. bsort         50.088221     4.645425

Optimisations:

  1. Inline the bsort_index() function and get rid of it.
  2. Make own function to check if the sort is complete. This saves one round of the transform.

New numbers from:
izabera/qsortbench.git

leaderboard
 1. illumos       10.854616     1.000000
 2. mine          11.947272     1.100663
 3. system        18.172552     1.674177
 4. glibc         11.479372     1.057557
 5. freebsd       15.017861     1.383546
 6. plan9         16.430186     1.513659
 7. bsd           16.926317     1.559366
 8. wada          18.590203     1.712654
 9. bsort         41.552180     3.828065
10. musl          47.432818     4.369829
11. linux         33.225245     3.060932
12. diet          25.536346     2.352579
13. mini          76.650238     7.061534
14. uclibc        55.612454     5.123392

Shave off some more microseconds by computing pointers once, instead of twice.

Pushed sorting algorithm here aswell, for easy testing:

https://github.com/hselasky/qsortbench

gmake CC=clang && ./qsort 100000
env CC=clang bash script

--HPS

Could you please adopt a similar change of D17083?

@delphij : Once your review is settled, I can do that.

lib/libc/stdlib/bsort.3
128

typo excepth -> except

Fix spelling error found by @gbe .

bcr added a subscriber: bcr.

Man page looks good now. Thanks, looks very interesting.

pauamma requested changes to this revision.Sep 24 2022, 1:44 PM
pauamma added a subscriber: pauamma.
pauamma added inline comments.
lib/libc/stdlib/bsort.3
26

Do we still use that? I foegot.

78–79
101–102
113–119

Unless they implement different sorting algorithms.

128–137

Check that this is what you meant.

Does this need a .Xr set_constraint_handler_s 3 like there is in memset(3)?

133

Spurious line, or mdoc subtlety I'm missing?

This revision now requires changes to proceed.Sep 24 2022, 1:44 PM
lib/libc/stdlib/bsort.3
137

Based on #bsddocs discussion

I'm currently waiting for D17083 to settle. Then there will be some updates here!

LGTM form my side since D17083 has finally landed.

Will update this patch tomorrow so that the argument order matches the glibc qsort_r.

delphij requested changes to this revision.Oct 26 2022, 9:10 PM

@delphij : Once your review is settled, I can do that.

Hi, I think the bsort_b will not work as-is, please see my comment in bsort_r.c (should it be bsort_b.c or probably removed?) for additional details.

lib/libc/stdlib/bsort_r.c
16

Please note that in this revision, bsort_r is using the GNU qsort_r style signature, and bsort_b will not work as presented. The reason is that the qsort_b implementation was using the historical BSD compar interface, which passed the context pointer as the first parameter, and that matches how block ABI passes the block context. With the API change, the first parameter of compar is no longer the context pointer, which breaks the assumption.

If the intention was to replace the qsort(3) implementation with bitonic sort, it would be easier to implement bsort_b using the historical qsort_r style interface (which needs to be made available anyway, if that's the goal). If that's not a goal, you can use a combination of DECLARE_BLOCK and CALL_BLOCK to implement the block function instead (in which case bsort_b would no longer need to be implemented as a wrapper).

This revision now requires changes to proceed.Oct 26 2022, 9:10 PM
lib/libc/stdlib/bsort.c
178–185

I think this can be done with clz? Perhaps something like:

max = 1 << (8 * sizeof(size_t) - __builtin_clzll(n));
if (!powerof2(n))
    max <<= 1;

(assuming n >= 2, which was tested above in line 174).

Yes, my goal is to replace qsort(). I'll have a look at your comments.

Probably will work on this next week. Was a little short of time this week.

Still a bit busy, but will try to look at this patch soon. Thanks for all the reviews.

hselasky marked 8 inline comments as done.
  • Rebase and update patch.
  • Try to respond to all comments so far.
hselasky added inline comments.
lib/libc/stdlib/bsort_r.c
16

Issues should be fixed now!

I would like bsort() to stay apart from qsort() in the beginning. Then at some later point we can simply alias the two. This will allow proper testing, in the rare case bsort has any issue.

Please review the latest version of this patch.

Fix use of clzll (should subtract one there).

Think about systems where sizeof(size_t) != sizeof(unsigned long) .

Manual page English LGTM. As always, I can't attest to consistency with source code.

I have a concern about this change. Qsort and bsort are both deterministic but neither is stable, and I imagine they won't sort equal keys in the same order. This could be an issue for an application that expects repeatability and should be flagged up very visibly

I have a concern about this change. Qsort and bsort are both deterministic but neither is stable, and I imagine they won't sort equal keys in the same order. This could be an issue for an application that expects repeatability and should be flagged up very visibly

For applications that expect repeatability, stable sort algorithms are required. In that case neither qsort nor bsort should be applied.

In D36493#853292, @zlei wrote:

For applications that expect repeatability, stable sort algorithms are required. In that case neither qsort nor bsort should be applied.

I disagree. It's a POLA violation.

If you want deterministic behaviour, we have mergesort in base.

The only problem with mergesort is that it needs allocate additional memory.

qsort and bsort are currently the only sorting algorithms that only swap two and two elements. The point with bsort is to avoid the excess CPU usage which qsort can lead to.

If you want deterministic behaviour, we have mergesort in base.

The only problem with mergesort is that it needs allocate additional memory.

qsort and bsort are currently the only sorting algorithms that only swap two and two elements. The point with bsort is to avoid the excess CPU usage which qsort can lead to.

@hselasky first thanks for the update and the improvement on such an important part of FreeBSD. I wouldn't think its not a POLA violation and could be integrated in -CURRENT and possibly merge to stable/13.

@delphij : Are you good with the current version of this patch?

@delphij : Are you good with the current version of this patch?

Sorry for the delay, I was sick in the past week.

I think the change is fine overall (with some minor nits, see comments inline; and if you are adding new interface, you would want to add them to Symbol.map, see rG597b02675751e48dd04777f1e91fee382bf3a966 for a recent example. Note that once they are added there we would be committed to the new interfaces and are expected to keep them forever).

I don't particularly like the new bsort_b implementation (mainly, a translator is used to translate the block function call to a regular bsort_r call, so callers of bsort_b are slightly penalized). For a initial implementation I think we can land it as-is, but we are so close and I do feel it's better to just do it right in one goal 😊

include/stdlib.h
110

This should probably live under #if __BSD_VISIBLE as it's not in POSIX.

320

Similarly -- this should be #if __BSD_VISIBLE.

401

Ditto; this is not part of the ISO C standard, so it should be covered by our namespace (I do think it's logically belonging here, nested under __EXT1_VISIBLE)

lib/libc/stdlib/bsort_r.c
17

Style: a blank line should be added between the variable definition and actual code.

@delphij : Thank you for all input. I will start working on it next week hopefully!

@zlei: Your point is valid, and it is not very difficult to add code to make the algorithm a stable sorter, simply by including the position in the array when comparing elements! How important is this?

@zlei: Your point is valid, and it is not very difficult to add code to make the algorithm a stable sorter, simply by including the position in the array when comparing elements!

I've not read though the bitonic sorting algorithm but it is great that making it a stable one is simple (and not performance penalty)!

How important is this?

It depends.

  1. For the case that natural sort, such as arrays of elements like {'a', 'c', 'b', 'b', 'd'} or {1, 3, 2, 2, 4}, both stable sort and unstable ones are competent.
  2. For more complex cases, stable sort might be valuable, think of this case:

The element struct person { char *name, int age, ... } , and the array { { "Charlie", 20 }, { "Smith", 20 }, { "Adam ", 19}, ... }, and we want to sort the array by age.
When the array is dynamic, i.e. elements change after every sort, for unstable sort that may end up of different order of "Charlie" and "Smith" (for every sort).
Still it is possible we add a field original_order ( I think it is same as position in the array of bsort ) to the element and supply a comparator that sort by age and then original_order, but that might be overkilled and cumbersome.

I'd suggested that do benchmarking test before implementing bitonic sorting algorithm a stable one.