The benchmark used here is sorting an aligned array of n=1024 random int32 items in L1 cache on a 3.5GHz Intel Xeon E3-1275 v3 (Haswell) core. This is a typical size of interest in post-quantum cryptography. For this size, djbsort runs in under 8000 cycles.

