This graph compares sorting performance for various int32 array sizes on one Zen 2 core (in an AMD EPYC 7742, boost disabled, gcc 10.2.1, clang 11.0.1-2) using the following libraries (as of 20240116):

There are also many slower sorting libraries. The graph includes one of those, std::sort, for comparison. The most important lesson from the graph is that there's still room for speed improvement in all of these libraries (including djbsort, which wins for some sizes but not others).

This page explains how to rebuild the above graph. It would be interesting to systematically track how the graph changes as CPUs and compilers vary. This page also gives some performance numbers for other libraries.

This page focuses purely on int32 sorting speed using AVX2. Some libraries obtain an extra speed boost from AVX-512. Aside from the speed battle, a reason to use a library other than djbsort is support for more data types. Reasons to use djbsort include security and verification.

The sortbench script

The script generating the above graph is called sortbench. Here's how to run sortbench.

These instructions are for Debian/Ubuntu systems running on CPUs with AVX2. Other modern Linux/BSD/UNIX systems should work with minor adjustments to the instructions. These instructions need the following packages:

In a root terminal, create a sortbench user:

    adduser --disabled-password --gecos sortbench sortbench

Run a shell as that user:

    su - sortbench

As that user, download, unpack, and run the latest version of sortbench:

    wget -m
    wget -m$version.tar.gz
    tar -xzf$version.tar.gz
    cd sortbench-$version

This should finish in well under an hour. The exact time depends on your CPU. The resulting graph will be in sortbench-$version/plot.pdf.

sortbench archives

sortbench-20240116.tar.gz browse

Other libraries

Software predating djbsort (see the separate list of references) takes much more time than djbsort does. The following software is now incorporated into the crypto_sort/int32 directory in the SUPERCOP benchmarking framework:

cycles on kizomba cycles on samba cycles on titan0 constant-time software
4929 5250 6596 yes avx2 from djbsort
9664 fail fail no sid1607 from 2016 Santurkar
13756 13466 15136 no krasnov from 2017 Gueron–Krasnov
15682 15875 18548 yes oldavx2 from 2017 Bernstein–Chuengsatiansup–Lange–van Vredendaal
19517 19832 21844 no herf based on float32 code from 2001 Herf
26998 28664 25608 no stdsort using C++ standard library
83041 86442 95672 yes portable4 from djbsort
90155 97191 100840 yes portable3 from djbsort
fail fail fail no aspas from 2018 Hou–Wang–Feng

These cycle counts are for int32[768] arrays with random contents; variable-time software could be considerably slower for some array contents than for others. The sid1607 failures are because sid1607 needs C++11, which is not supported with the default compiler options for the compilers shipped with Ubuntu before 18.04. The aspas failures have not been diagnosed, but other measurements suggest that aspas is slower than stdsort.

Version: This is version 2024.01.17 of the "Comparison" web page.