Thanks for the replies. I finished a first pass of the documentation, tests, and other work of including my algorithm in boost and uploaded it here:

https://github.com/skarupke/sort/tree/masterI also re-ran all the single threaded benchmarks and updated the documentation.

For the big O complexity: I am actually still not quite sure what it is. I used similar notation to what spreadsort uses. I think O(n*key_length) is correct for the worst case, but for long keys (like when sorting long strings) the algorithm may fall back to comparison based sorting, so I used the same thing as spreadsort: min(O(n log n), O(n key_length)).

For speed: ska_sort should be the fastest sorting algorithm even for small numbers of items. I included three graphs of benchmarks, for sorting ints, floats and strings:

https://github.com/skarupke/sort/blob/master/doc/images/sorting_ints.pnghttps://github.com/skarupke/sort/blob/master/doc/images/sorting_strings.pnghttps://github.com/skarupke/sort/blob/master/doc/images/sorting_floats.pngFor ints and strings ska_sort is always fastest, for floats pdqsort is briefly faster, but starting at N >= 128 ska_sort is faster. Part of the work of this new version of ska_sort was to make radix sort really fast even for small N. Before I would use std::sort for any small array of less than 128 items. Now that cutoff is at 32 items, so I skip straight to insertion sort and never do std::sort.

The biggest effect of this change is that it removes the "waves" from the benchmark graph, like we still see for spreadsort in the above picture. I used to also have those when I gave a talk about this algorithm. Now the graph is much more smooth. (this also confuses my answer to the question "what is the big O complexity of this algorithm?" Because the O(n*b) answer that I gave in my talk was justified by the presence of the waves)

The biggest questions I have at this point are around the interface. I reworked that completely so that even relatively advanced use cases are now pretty straightforward. I don't think that anyone else will ever need to use those advanced features, so I kept the documentation of them small, choosing instead to focus on the common use cases. And I made sure to give lots of examples of converting existing comparison-based sorting examples to my interface.

Oh and for the big point of a parallel version of this algorithm: At this point I have no idea how to do that. I haven't even begun to look into parallel algorithms. The current version was a lot of work already. I think that'll have to be a project for another time. (it's definitely appealing though since currently the sorted algorithms are only 2.4 times as fast as ska_sort on a machine with 8 cores/16 hyper threads in benchmark_numbers.cpp, so even a little bit of parallelism would beat the existing algorithms...)

Francisco, since you'll be busy in the near term it sounds like I can't expect too much response for another month, but I would really appreciate it if you could look at it a little just to see if anything obvious stands out to you that I can work on while I wait. (otherwise no biggie, I understand how hard it can be to find time for maintaining projects, since that is exactly one of the reasons why I want to get this into boost...)

Thanks,

Malte

--

Malte Skarupke

[hidden email]
> Date: Sun, 20 Oct 2019 13:24:06 +0200

> From: Francisco Jos? Tapia <

[hidden email]>

> To:

[hidden email]
> Subject: Re: [boost] Adding my sorting algorithm, ska_sort to boost

> Message-ID:

> <

[hidden email]>

> Content-Type: text/plain; charset="UTF-8"

>

> Hi Malte,

> I am Francisco Tapia, one of the co-authors of the Sort library.

> We will examine your algorithm. But we must examine the internal parts of

> the algorithm, and evaluate the correctness of the algorithm, how is

> implemented the worst case and a couple of details. After, we check the

> speed with the benchmarks used in the library, with several sets of data.

> Since the near sorted to the fully unsorted, and from the single numbers to

> structs with heavy comparison functions.

>

> Now, I am overburdened by work, and I can't revise until December. But we

> will do and provide to you a reasoned response, and according to it, decide

> the inclusion or not in the library.

> As commented by Phil Endecott, the most interesting now is the parallel

> algorithms. Today, we don't have any parallel algorithm based on radix sort

> or any other hybrid algorithm.

> Because with small sets of data the differences are insignificant. But when

> you have big sets of data, use parallel algorithms, and then, the new

> single thread algorithms are in the middle, in the no man's land.

>

> If you have any doubt or question, contact me, and I will try to answer you.

>

> El mar., 15 oct. 2019 a las 23:06, Kostas Savvidis via Boost (<

>

[hidden email]>) escribi?:

>

> >

> > > On Oct 13, 2019, at 03:06, Malte Skarupke via Boost <

> >

[hidden email]> wrote:

> > >

> > > Hi,

> > > a couple of years ago I wrote a generalized version of radix sort,

> > called ska_sort. I gave a talk about it at CppNow, available here:

> > >

https://www.youtube.com/watch?v=zqs87a_7zxw> > >

> > > I have a new version of that algorithm that is both faster and more

> > general. I can now sort almost anything. For example when I gave the talk I

> > couldn't sort a std::vector<std::list<int>> because std::list doesn't have

> > a random access interface, but now that works. (and it's faster than

> > sorting the same thing with std::sort)

> > >

> >

> > This seems like a product of a large investment of effort.

> > The big advantage over any previously available radix_sort

> > is the ability to sort almost any datatype if a key maps to an integer

> > somehow.

> > Plus, an advantage in measured practical speed, and apparently no claims

> > about Big O improvement.

> >

> > The average performance is O(n*b) where b is unknown and/or dependent on n

> > ?

> >

> > Is the theoretical worst-case O(n^2) ? Why? Is it because of working

> > in-place?

> > (yes, I understand you fall back to std::sort if you hit this case)

> >

> > Is this software/algorithm fully documented? - I mean by another way than

> > reading the source code.

> >

> > Maybe too many questions at once, but it was not clear after watching your

> > talk at cppcon and the blog posts.

> >

> > Cheers,

> > Kostas

> >

> > <

https://www.youtube.com/watch?v=Nz1KZXbghj8>

> >

> > _______________________________________________

> > Unsubscribe & other changes:

> >

http://lists.boost.org/mailman/listinfo.cgi/boost> >

>

>

> ------------------------------

>

> Subject: Digest Footer

>

_______________________________________________

Unsubscribe & other changes:

http://lists.boost.org/mailman/listinfo.cgi/boost