Zhangxp1998 ported quadsort to C++ and compared with std::sort and found it to be slower: https://gist.github.com/zhangxp1998/0e2fa30656c894017d183e0d...
Shameless self promotion, if you want a faster sorting implementation, check out pdqsort (https://github.com/orlp/pdqsort) which is almost twice as fast as libstdc++'s std::sort for randomly shuffled integers, and has all sorts of tricks for input distributions with patterns (like pre-sorted, reverse sorted, single out-of-order element appended, etc).
Someone has to have put together a nice benchmark for comparing many sorting algorithms. I wish that the author had done some benchmarking first, so that the proposed algorithm can properly be positioned w.r.t. state-of-the-art techniques.
Benchmark of quadsort() versus C qsort():
* ~10x faster on forward-order items
* ~2x faster on reverse-order items
* ~equivalent on random-order items
* Ordering: when blocks of items are in order, or in reverse-order, then do special case handling, which gives quadsort O(n + log n) instead of qsort O(n * log n).
* Boundaries: compare data rather than traditional merge sort wasteful boundary checks.
A few language's standard library sorts implement that already.
For a serious implementation, of course, you not only care about the asymptotic performance, but also absolute runtimes.
At lazy glance, it looks kind of like Quadsort has re-derived a variation of the 4-element sorting network.
In section 2 in that Wikipedia article, it links to many constructions that do exactly that.
A couple are O(n log(n)), but complicated and impractical. The algorithms people use are O(n log^2(n)) ones.
FWIW, sorting networks are data-independent. I.e. for any input, you always compare-and-swap the same elements.
"For one to eleven inputs, minimal (i.e. size-optimal) sorting networks are known, and for higher values, lower bounds on their sizes S(n) can be derived inductively using a lemma due to Van Voorhis: S(n + 1) ≥ S(n) + ⌈log2(n)⌉. The first ten optimal networks have been known since 1969, with the first eight again being known as optimal since the work of Floyd and Knuth, but optimality of the cases n = 9 and n = 10 took until 2014 to be resolved. An optimal network for size 11 was found in December of 2019 by Jannis Harder..."
Clearly I also should've said < ~12.
For some particular N, you might be able to find the smallest sorting network that sorts N elements. And then you might be able to find an even faster algorithm that sorts N elements using some strategy that's not equivalent to a sorting network (e.g., if your strategy does data-dependent memory access).
The search space is infinite, so exhaustively exploring it is impossible.
Procedures that enumerate Turing machines are generally very easy, or nigh impossible
O(n log n) is only the frontier for comparison based sorts that know nothing about the distribution of inputs.
If your sorting algorithm is allowed to do anything else on your data, like hashing or looking at bits or arithmetic, different lower bounds might apply.
In the code review I initially complained, but then I couldn’t think of any other way to interpret the design. So I guess that’s a feature now. ‘Course later we found a performance problem, but, you know…
Summarize: Slower than std::sort except on random tail.
It seems really obvious in hindsight, so I'm sure there's just prior art I don't know about.
There are, in general, a number of different sorting algorithms which are optimized for a specific number of elements. In this case, four. It uses five binary comparisons to sort four elements.
You can find other algorithms like this, such as an algorithm that uses seven comparisons to sort five items, or one that uses ten comparisons to sort six items.
In theory, you can get O(n log k) performance, where k is the number of distinct elements (so k <= n). And crucially: you don't need to know k up-front.
In practice, all my attempts were absolutely slower than the standard approach based on binary comparisons only. (But that's saying more about me than about the domain.)
The author's explanation isn't entirely clear, but it seems similar to the above construction with a fixed N and then a merge sort afterwards.
For five elements the lower bound from counting the permutations is ceil(log2(5!)) which says you cannot sort 5 elements in less than 7 comparisons.
An actual 7 comparison algorithm exists but it is not very easy to write it. For greater numbers it gets much much trickier - in general the log(#permutations) lower bound cannot be met.
When I started, I wrote an Excel spreadsheet to generate what I would now describe as a 50 element array.
I also recall seeing a project to programatically generate mnemonic operators in Haskell, limited only by the ability of the compiler to not run out of memory. Sadly, I can't seem to find it.
Ha, it's nothing more complicated than string concatenation.
Like others have said, it would be cool to see quadsort stacked up to other current state-of-the-art sorting algorithms.
1000000 is peanuts. We need to see convergence for about billions+ of numbers.
Frankly, I am more persuaded by the arguments in favor of an algorithm like "Tim-sort", which doesn't claim to micro-optimize hardware more efficiently (that's basically what "better swapping" is claiming), but instead claims that the algorithm is particularly efficient on some commonly-seen patterns (like "partially-sorted", or "pieces of the list are reverse-sorted"). Of course, any kind of argument about why one sort is better than another are inferior to actual benchmarks.
There are several similar videos on YouTube demonstrating sorts. If that's a bit sterile for you, you can get a more human touch via the playlist https://www.youtube.com/watch?v=EdIKIf9mHk0&list=PLOmdoKois7...
"These operations do require doubling the memory overhead for the swap space."
It seems it is not in-place.
(This sort does not use the term that way, but another sort could.)
 example: int a, b; a ^= b ^= a ^= b;