djbsort: References

There is a vast literature on sorting, so this page cannot hope to be comprehensive. Reasons to include something here:

This is, or could be, something to credit for ideas used in djbsort.
The results are, or have been claimed to be, competitive with djbsort in speed. Speed results are not excluded on the basis of security or verification. Results are also not excluded on the basis of array-size limitations (only multiples of 64, only powers of 2, etc.) or array-alignment requirements.

Available software

2001 Herf "radix"

2014 Hegeman "floki" (documentation reports 0.92ms on a Core i7-4770S for int32[65536]; this is between 10.9 cycles/byte and 13.7 cycles/byte, since core is 3.10 GHz base and 3.90 GHz boost)

2016 Santurkar "avx2-merge-sort"

2017 Gueron–Krasnov "avx_qsort"

2017 Bernstein–Chuengsatiansup–Lange–van Vredendaal "ntruprime-20170815.tar.gz": constant time; direct ancestor of djbsort; see also paper "NTRU Prime: reducing attack surface at low cost"

2017 Bramas "avx-512-sort": only for AVX-512; see also paper "A novel hybrid quicksort algorithm vectorized using AVX-512 on Intel Skylake"

2018 Hou–Wang–Feng "aspas": see also paper "A framework for the automatic vectorization of parallel sort on x86-based processors"

2020 Blacher–Giesen–Kühne "fast-and-robust": see also paper "Fast and robust vectorized in-place sorting of primitive types"

2020 Shechter "vxsort": see also blog post

2020 Bingmann–Marianczuk–Sanders "SmallSorters": only for small sizes; see also paper "Engineering faster sorters for small sets of items"

2022 Blacher–Giesen–Sanders–Wassenberg "vqsort"/"highway": see also paper "Vectorized and performance-portable Quicksort"

2022 Intel "x86-simd-sort" (development taken over by numpy in 2025)

Harder-to-verify performance claims

1987 Rönsch–Strauss "Timing results of some internal sorting algorithms on vector computers"

2007 Furtak–Amaral–Niewiadomski "Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms"

2008 Chhugani–Macy–Baransi–Nguyen–Hagog–Kumar–Lee–Chen–Dubey "Efficient implementation of sorting on multi-core SIMD CPU architecture"

2010 Anonymous "A question of sorts"

2013 Xiaochen–Rocki–Suda "Register level sort algorithm on multi-core SIMD processors"

2015 Hayes–Palomar–Unsal–Cristal–Valero "VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors"

2015 Hou–Wang–Feng "ASPaS: a framework for automatic SIMDization of parallel sorting on x86-based many-core processors": reports, e.g., about 220 million cycles (55 cycles/byte) on 1.05GHz Xeon Phi 5110P (Knights Corner) for n=1000000

2016 Xu–Xu–Yin–Zheng "Optimized merge sort on modern commodity multi-core CPUs", Telkomnika 14 (2016), 209–318: reports, e.g., about 110 million cycles (27 cycles/byte) on 2.4GHz Core i7 4700MQ (Haswell) for n=1048576

2016 Codish–Cruz-Filipe–Nebel–Schneider-Kamp "Optimizing sorting algorithms by using sorting networks", Formal Aspects of Computing 29 (2017), 559–579: reports, e.g., about 150 cycles (2.3 cycles/byte) on unspecified "Core i7" for n=16; about 120000 cycles (3 cycles/byte) for n=10000

2016 Gueron–Krasnov "Fast quicksort implementation using AVX instructions", The Computer Journal 59 (2016), 83–90: reports, e.g., about 160000 cycles (40 cycles/byte) for n=1000 int32 on Haswell (Figure 4), and about 130000 cycles (32 cycles/byte) for n=1000 int32 on Haswell using radix sort in Intel's Integrated Performance Primitives library, "the fastest in-memory sort we are aware of"

2019 Yin–Zhang–Müller–Liu–Wei–Schmidt–Liu "Efficient parallel sort on AVX-512-based multi-core and many-core Architectures"

Sorting networks

1968 Batcher "Sorting networks and their applications"

1983 Kumar–Hirschberg "An efficient implementation of Batcher's odd-even merge algorithm and its application in parallel sorting schemes"

1986 Marberg–Gafni "Sorting in constant number of row and column phases on a mesh"

1987 Chung–Ravikumar "Bounds on the size of test sets for sorting and related networks", Discrete Mathematics (1990)

1994 Lee–Batcher "A multiway merging network"

1998 Lang "Bitonic sorting network for n not a power of 2"

2006 Chaudhry–Cormen "Slabpose columnsort: a new oblivious algorithm for out-of-core sorting on distributed-memory clusters"

2007 Even–Levi–Litman "Optimal conclusive sets for comparator networks", Theoretical Computer Science (2009)

2015 Ebbesen "On the practicality of data-oblivious sorting"

Correctness of sorting software

2013 de Gouw–Rot–de Boer–Bubel–Hähnle "OpenJDK’s java.utils.Collection.sort() is broken: The good, the bad and the worst case"

Symbolic execution

2007 Nethercote–Seward "Valgrind: a framework for heavyweight dynamic binary instrumentation", ACM SIGPLAN Conference on Programming Language Design and Implementation

2016 Shoshitaishvili–Wang–Salls–Stephens–Polino–Dutcher–Grosen–Feng–Hauser–Kruegel–Vigna "SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis", IEEE Symposium on Security and Privacy

Version: This is version 2026.02.04 of the "References" web page.