Any interest in Compile Time Hash Containers library?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
Hi everybody, my first message, I hope I do not mess this up.

I was contemplating writing a Compile Time Hash Containers library and I
wrote a test implementation for sets and benchmarked it against mostly
ska:: sets since they are the best.

Document is here:

https://docs.google.com/document/d/1IsUiWl3K0h2pgttYgMt67CxgJ3zeXBKZEA7hvAQ5JcY/

It is quite long, but it aims to provide a introduction to the problem for
people unfamiliar with details of hashing, and also it goes into a lot of
discussions, if you care just about the numbers you can skip the "boring
parts".

My interpretation of benchmarks is that it performs very well in some
cases, decently in others, but I am biased. :)

Code is not available online since I do not want to dump a test project
online without extensive documentation, but if somebody has a problem that
may be benefited by this test implementation feel free to let me know, I
would be happy to get some real world feedback on prototype.
Limitations is that test implementation only works for sets, so no maps and
if your objects are not integers you will probably not get much speedup.
Also if your set has more than 100ish items GCC will run out of memory and
VC++ does not compile code at all, so clang is the only way for most use
cases today.

One more disclaimer is that I do not want this library to be some beast
like ranges that takes years to implement, so if after reading the document
you consider implementation quite basic I won't be offended :) In other
words I feel that library like this although relatively simple compared to
regular Boost libraries can provide quite a lot of benefit for some
usecases where people really really care about performance of hash
containers where values are known during compile time.

I know you can not predict the future of an unimplemented library, but I
would be happy with some general feedback, and if you think this library
would be nice boost proposal I would be interested in some pointers wrt how
to proceed.

If somebody wonders how user code looks at the moment:

#include "cxhash_set.h"
// also could be a result of constexpr function, no need for initializer
list
static constexpr std::array<int, 20>
ints{0,1,2,3,5,8,21,34,64,128,256,512,1024,1701,1729,2048,4096,12345,123456,1234567};
int main()
{
    cxhash_set<ints> ints_set;
    // contains could(should be?) static, but for now it is "normal" member
fn
    std::cout << std::boolalpha << ints_set.contains(21) << std::endl;
    std::cout << std::boolalpha << ints_set.contains(22) << std::endl;
}

P.S. feel free to comment in the google doc, I actually find mailing lists
quite limited when it comes to prolonged and or detailed discussions.
Maybe we can ping Alexandrescu for an instance of dlang forums, so we get a
better discussion tools, and he gets a way to convert us to D. :)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
While I don't personally have a use for it, I can see that someone will,
and besides, it's kind of cool that you can do that ;)

Question: would the interface be better specified with a constexpr
initializer-list constructor rather than templating on an array of
data?  That way you would have complete control of memory layout as
well?  Or am I misinterpreting things?

Best, John.

On 21/01/2019 13:39, Ivan Matek via Boost wrote:

> Hi everybody, my first message, I hope I do not mess this up.
>
> I was contemplating writing a Compile Time Hash Containers library and I
> wrote a test implementation for sets and benchmarked it against mostly
> ska:: sets since they are the best.
>
> Document is here:
>
> https://docs.google.com/document/d/1IsUiWl3K0h2pgttYgMt67CxgJ3zeXBKZEA7hvAQ5JcY/
>
> It is quite long, but it aims to provide a introduction to the problem for
> people unfamiliar with details of hashing, and also it goes into a lot of
> discussions, if you care just about the numbers you can skip the "boring
> parts".
>
> My interpretation of benchmarks is that it performs very well in some
> cases, decently in others, but I am biased. :)
>
> Code is not available online since I do not want to dump a test project
> online without extensive documentation, but if somebody has a problem that
> may be benefited by this test implementation feel free to let me know, I
> would be happy to get some real world feedback on prototype.
> Limitations is that test implementation only works for sets, so no maps and
> if your objects are not integers you will probably not get much speedup.
> Also if your set has more than 100ish items GCC will run out of memory and
> VC++ does not compile code at all, so clang is the only way for most use
> cases today.
>
> One more disclaimer is that I do not want this library to be some beast
> like ranges that takes years to implement, so if after reading the document
> you consider implementation quite basic I won't be offended :) In other
> words I feel that library like this although relatively simple compared to
> regular Boost libraries can provide quite a lot of benefit for some
> usecases where people really really care about performance of hash
> containers where values are known during compile time.
>
> I know you can not predict the future of an unimplemented library, but I
> would be happy with some general feedback, and if you think this library
> would be nice boost proposal I would be interested in some pointers wrt how
> to proceed.
>
> If somebody wonders how user code looks at the moment:
>
> #include "cxhash_set.h"
> // also could be a result of constexpr function, no need for initializer
> list
> static constexpr std::array<int, 20>
> ints{0,1,2,3,5,8,21,34,64,128,256,512,1024,1701,1729,2048,4096,12345,123456,1234567};
> int main()
> {
>      cxhash_set<ints> ints_set;
>      // contains could(should be?) static, but for now it is "normal" member
> fn
>      std::cout << std::boolalpha << ints_set.contains(21) << std::endl;
>      std::cout << std::boolalpha << ints_set.contains(22) << std::endl;
> }
>
> P.S. feel free to comment in the google doc, I actually find mailing lists
> quite limited when it comes to prolonged and or detailed discussions.
> Maybe we can ping Alexandrescu for an instance of dlang forums, so we get a
> better discussion tools, and he gets a way to convert us to D. :)
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, 24 Jan 2019 at 01:34, Ivan Matek via Boost
<[hidden email]> wrote:
>
> Hi everybody, my first message, I hope I do not mess this up.
>
> I was contemplating writing a Compile Time Hash Containers library and I
> wrote a test implementation for sets and benchmarked it against mostly
> ska:: sets since they are the best.
>

In my experience that would be very useful. This kind of technique is
already used by some people to create hashes as keys to acquired
localized texts.
(This combined with something like the proposal for std::embedd()
would be particularly powerful.)

A. Joël Lamotte

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Jan 24, 2019 at 9:49 AM John Maddock via Boost <
[hidden email]> wrote:

> Question: would the interface be better specified with a constexpr
> initializer-list constructor rather than templating on an array of
> data?  That way you would have complete control of memory layout as
> well?  Or am I misinterpreting things?
>
I think that if you want best performance you want to template on values
since if for example you only template on type, then you can not know some
very useful stuff at compile time. We want to pick a constant for universal
hashing that works great for our values. Consider example where we are
lucky and we get no collisions. If we know that at compile time compiler
can remove the loop(probe until some condition) since it knows iteration
count is 1.
If that value is just some member variable that can be 1 or 2 or 5
depending on data compiler can not remove that loop.
In other words I believe that for best performance codegen for different
values needs to be different.
I am happy to be proven wrong. :)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
On Fri, 25 Jan 2019 at 01:00, Ivan Matek via Boost <[hidden email]>
wrote:

> I think that if you want best performance you want to template on values
> since if for example you only template on type, then you can not know some
> very useful stuff at compile time. We want to pick a constant for universal
> hashing that works great for our values. Consider example where we are
> lucky and we get no collisions.


How is this [whatever it is] different from frozen (
https://github.com/serge-sans-paille/frozen )? From what I read, it
[frozen] always achieves perfect hashing. The good thing about frozen is
that it's not pie-in-the-sky and it actually works.

If we know that at compile time compiler
> can remove the loop(probe until some condition) since it knows iteration
> count is 1.
> If that value is just some member variable that can be 1 or 2 or 5
> depending on data compiler can not remove that loop.
> In other words I believe that for best performance codegen for different
> values needs to be different.
> I am happy to be proven wrong. :)
>

Did you, or did you not write the code [you say it's not written and then
post timing results, it's confusing]? How can we prove you wrong (or
right), if all we see is lots of words and no code? Stop argumenting and
start showing some code.

degski
--
*“If something cannot go on forever, it will stop" - Herbert Stein*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Wed, Jan 23, 2019 at 4:34 PM Ivan Matek via Boost
<[hidden email]> wrote:
> I was contemplating writing a Compile Time Hash...library

The very first thing that popped into my head when I saw this is "I
would like a library that emits a perfect hash function at compile
time from a set of inputs."

Motivating use case:

<https://github.com/boostorg/beast/blob/develop/include/boost/beast/http/impl/field.ipp#L78>

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
> How is this [whatever it is] different from frozen (
> https://github.com/serge-sans-paille/frozen )?
>
If you look into the document and search for frozen you will find out.

>
> Did you, or did you not write the code [you say it's not written and then
> post timing results, it's confusing]? How can we prove you wrong (or
> right), if all we see is lots of words and no code? Stop argumenting and
> start showing some code.
>
> Like I said in he original message and the document I implemented set that
works good for integers(you can provide custom hasher for other types, but
perf is bad). There is no map implemented, and for complex types like
strings performance sucks(again this is discussed in the doc).
If you want the code feel free to ping me directly on my mail, only things
I require is that:
you do not claim ownership of code
you do not blame me for bugs in case you do use code in real production and
it causes damage, in other words no liability on my part as permissible by
law
you do not expect me to support you in x years(if I can help you today I
will) or implement features on top of this code(I think there is a good
chance this code will be changed significantly including: names and hashers
avaliable)
you do compile with clang(GCC/VC++ are partly/fully broken for this code)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Any interest in Compile Time Hash Containers library?

Boost - Dev mailing list
On Fri, 25 Jan 2019 at 08:37, Ivan Matek <[hidden email]> wrote:

>
> How is this [whatever it is] different from frozen (
>> https://github.com/serge-sans-paille/frozen )?
>>
>
From your doc:

Frozen

Another example of library doing very similar thing to what we want to do
is Frozen
<https://blog.quarkslab.com/frozen-an-header-only-constexpr-alternative-to-gperf-for-c14-users.html>.
For details you should consult the documentation,

but the general idea is similar: try to get good hashing based on the fact
the data is known at compilation time. Difference to us is that frozen does
perfect hashing.


"For details you should consult the documentation", right, we're getting
somewhere, oh, maybe not ... "Difference to us is that frozen does perfect
hashing": ... and your idea is ... to do imperfect hashing? The document is
far too long and there are far too many diversions/excursions into
la-la-land. Take as example the readme.md from frozen and put something
similar together, please.


> If you want the code feel free to ping me directly on my mail ...
>

Just stick it on github, that's what the rest of the world does ... if you
want to have it included in Boost, that's what you'll need to do in the end
anyway ...


> , only things I require is that:
> you do not claim ownership of code
> you do not blame me for bugs in case you do use code in real production
> and it causes damage, in other words no liability on my part as permissible
> by law
> you do not expect me to support you in x years(if I can help you today I
> will) or implement features on top of this code(I think there is a good
> chance this code will be changed significantly including: names and hashers
> avaliable)
> you do compile with clang(GCC/VC++ are partly/fully broken for this code)
>

... and put an appropriate license (BSL (
https://www.boost.org/users/license.html ) comes to mind, as that's what
you are proposing anyway) to deal with the above...

degski
--
*“If something cannot go on forever, it will stop" - Herbert Stein*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost