There is a vast literature on sorting, so this page cannot hope to be comprehensive. Reasons to include something here:

This is, or could be, something to credit for ideas used in djbsort.

The results are, or have been claimed to be, competitive with djbsort in speed. Results are not excluded on the basis of security or verification.
The benchmark used here is
sorting an aligned array of n=1024 random int32
items in L1 cache
on a 3.5GHz Intel Xeon E31275 v3 (Haswell) core.
This is a typical size of interest in postquantum cryptography.
For this size, djbsort runs in under 8000 cycles.
Available software
2001 Herf:
about 33000 cycles for n=1024 int32
on Haswell
(original code is for float32
but easily adapted to int32
)
2017 Gueron–Krasnov "avx_qsort":
about 50000 cycles for n=1024 int32
on Haswell
2017 Bernstein–Chuengsatiansup–Lange–van Vredendaal
"NTRU Prime: reducing attack surface at low cost"
(paper):
about 25000 cycles for n=1024 int32
on Haswell;
constant time;
direct predecessor of djbsort
2017 Bramas "A novel hybrid quicksort algorithm vectorized using AVX512 on Intel Skylake" (paper): only for Skylake using AVX512
2018 Hou–Wang–Feng
"A framework for the automatic vectorization of parallel sort on x86based processors":
about 37000 cycles for n=1024 int32
on Haswell
Hardertoverify performance claims
1987 Rönsch–Strauss "Timing results of some internal sorting algorithms on vector computers"
2007 Furtak–Amaral–Niewiadomski "Using SIMD registers and instructions to enable instructionlevel parallelism in sorting algorithms"
2008 Chhugani–Macy–Baransi–Nguyen–Hagog–Kumar–Lee–Chen–Dubey "Efficient implementation of sorting on multicore SIMD CPU architecture"
2013 Xiaochen–Rocki–Suda "Register level sort algorithm on multicore SIMD processors"
2015 Hayes–Palomar–Unsal–Cristal–Valero "VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors"
2015 Hou–Wang–Feng "ASPaS: a framework for automatic SIMDization of parallel sorting on x86based manycore processors": reports, e.g., about 220 million cycles (55 cycles/byte) on 1.05GHz Xeon Phi 5110P (Knights Corner) for n=1000000
2016 Xu–Xu–Yin–Zheng "Optimized merge sort on modern commodity multicore CPUs", Telkomnika 14 (2016), 209–318: reports, e.g., about 110 million cycles (27 cycles/byte) on 2.4GHz Core i7 4700MQ (Haswell) for n=1048576
2016 Codish–CruzFilipe–Nebel–SchneiderKamp "Optimizing sorting algorithms by using sorting networks", Formal Aspects of Computing 29 (2017), 559–579: reports, e.g., about 150 cycles (2.3 cycles/byte) on unspecified "Core i7" for n=16; about 120000 cycles (3 cycles/byte) for n=10000
2016 Gueron–Krasnov
"Fast quicksort implementation using AVX instructions",
The Computer Journal 59 (2016), 83–90:
reports, e.g.,
about 160000 cycles (40 cycles/byte) for n=1000 int32
on Haswell (Figure 4),
and
about 130000 cycles (32 cycles/byte) for n=1000 int32
on Haswell
using radix sort in Intel's Integrated Performance Primitives library,
"the fastest inmemory sort we are aware of"
Sorting networks
1968 Batcher "Sorting networks and their applications"
1983 Kumar–Hirschberg "An efficient implementation of Batcher's oddeven merge algorithm and its application in parallel sorting schemes"
1986 Marberg–Gafni "Sorting in constant number of row and column phases on a mesh"
1987 Chung–Ravikumar "Bounds on the size of test sets for sorting and related networks", Discrete Mathematics (1990)
1994 Lee–Batcher "A multiway merging network"
2006 Chaudhry–Cormen "Slabpose columnsort: a new oblivious algorithm for outofcore sorting on distributedmemory clusters"
2007 Even–Levi–Litman "Optimal conclusive sets for comparator networks", Theoretical Computer Science (2009)
2015 Ebbesen "On the practicality of dataoblivious sorting"
Symbolic execution
2007 Nethercote–Seward "Valgrind: a framework for heavyweight dynamic binary instrumentation", ACM SIGPLAN Conference on Programming Language Design and Implementation
2016 Shoshitaishvili–Wang–Salls–Stephens–Polino–Dutcher–Grosen–Feng–Hauser–Kruegel–Vigna "SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis", IEEE Symposium on Security and Privacy
Version: This is version 2018.07.30 of the "References" web page.