djbsort

On an Intel Haswell CPU core, djbsort sorts an array of 1024 signed 32-bit integers in L1 cache in 2.5 cycles/byte. The time is independent of the contents of the array. Previous software takes much more time, even without the constant-time requirement. See the separate list of references for various comparisons.

Cycle counts (median and quartiles) from one run of djbsort's int32-speed on titan0, a 3.5GHz Intel Xeon E3-1275 v3 (Haswell):

type size q1 median q3
int32 1 24 24 24
int32 2 24 27 45
int32 4 176 186 280
int32 8 298 308 335
int32 16 509 511 524
int32 32 939 962 1007
int32 64 1503 1546 1614
int32 128 2197 2210 2276
int32 256 2541 2547 2557
int32 512 4690 4699 4711
int32 1024 10244 10260 10290
int32 2048 22756 22765 22778
int32 4096 50394 50448 50522
int32 8192 127453 127653 127844
int32 16384 342860 343206 343491
int32 32768 832565 832926 833390
int32 65536 1980669 1981584 1982668
int32 131072 4838501 4840170 4844346
int32 262144 11311610 11313671 11316128
int32 524288 26186726 26190482 26195490
int32 1048576 61455124 61467822 61485283

Version: This is version 2018.07.15 of the "Speed" web page.