Boost 1.68.0 - boost hashing changed ?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
Hello there,

I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.

boost::hash<string> namehash;
size_t hashCode = namehash( name);

Where name is string, an e.g “abcdefg”

With boost 1.53.0 the hashCode is 168904

And with boost 1.68.0 the hashCode is 69530. I am giving just sample random number here, those two numbers are never equal, with everything remain the same, only change boost version on EL6 Linux machine with GCC 4.9 compiler.

This is blocking me to upgrade to newer version of boost.

I appreciate if anyone has some info on this.

Thanks  
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
<[hidden email]> wrote:
>
> I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.

Hm... why would you expect the hash to be always the same between
releases, compilers, etc.? I cannot find it with a quick look at
Boost.Hash's docs anything regarding a guarantee of that. If it is
like std::hash, then it is only guaranteed to remain equal for the
duration of the program. In other words, you cannot rely on saving it
nor comparing them to other hashes from other vendors, platforms,
architectures, compiler releases, etc.

Also, taking a quick look at the repository, there were several
changes between 1.53 and 1.68, e.g.:

  https://github.com/boostorg/container_hash/commit/bb2a91bf47354bfce7378394bc0fa84c76ecfe4e
  https://github.com/boostorg/container_hash/commit/309d17f38722b7bd15b804e55d1d8d6c3cd8691a

Cheers,
Miguel
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users <[hidden email]> wrote:
On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
<[hidden email]> wrote:
>
> I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.

Hm... why would you expect the hash to be always the same between
releases, compilers, etc.?

Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

I cannot find it with a quick look at
Boost.Hash's docs anything regarding a guarantee of that. If it is
like std::hash, then it is only guaranteed to remain equal for the
duration of the program.

Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's]. I'm not sure that is a great one and by the time we might [would like to] have constexpr std::ordered_map maybe not even tenable.
 
In other words, you cannot rely on saving it
nor comparing them to other hashes from other vendors, platforms,
architectures, compiler releases, etc.

In my view this is an omission, the option to have exactly that should [have been] available.

degski
--
If something cannot go on forever, it will stop" - Herbert Stein

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
On Tue, Oct 23, 2018 at 10:19 AM degski <[hidden email]> wrote:

>
> On Tue, 23 Oct 2018 at 08:45, Miguel Ojeda via Boost-users <[hidden email]> wrote:
>>
>> On Mon, Oct 22, 2018 at 7:57 PM Shailja Prasad via Boost-users
>> <[hidden email]> wrote:
>> >
>> > I was trying to upgrade boost 1.53.0 to boost 1.68.0. But, it looks like hashing code generation has changed, since the following line gives two different hashcode for same string input.
>>
>> Hm... why would you expect the hash to be always the same between
>> releases, compilers, etc.?
>
>
> Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are
used, among other things, in network communications, persistent
storage, etc. They need to be "fixed" functions, and their standards
provide the exact definition. That is not the case at all with
std::hash or Boost.Hash.

>
>> I cannot find it with a quick look at
>> Boost.Hash's docs anything regarding a guarantee of that. If it is
>> like std::hash, then it is only guaranteed to remain equal for the
>> duration of the program.
>
>
> Sort of: "Hash functions are only required to produce the same result for the same input within a single execution of a program". The standard states a minimum requirement [with an intended [narrow] use case in mind, std::ordered_map's].

Not sure what you mean. That is what I said.

>
>>
>> In other words, you cannot rely on saving it
>> nor comparing them to other hashes from other vendors, platforms,
>> architectures, compiler releases, etc.
>
>
> In my view this is an omission, the option to have exactly that should [have been] available.
>

Not really. You could argue, for instance, that precisely because
std::hash (and Boost.Hash) is meant to be used in maps/hash
tables/..., you should not be able to guess the values of the hash in
advance, in order to prevent collision attacks. In other words, the
implementation has even the freedom to provide a different hash
function every run of your program.

Not only that, but stating that the hash should remain constant across
C++/Boost releases is basically stating the hash function should be
fixed forever. That removes all the freedom for improvements when
future hash functions are discovered or implemented, with better
properties (which is what happened in the commits I linked).

In summary: the hashes provided by Boost or the standard are not
intended to be fixed functions; i.e. you shouldn't rely on the actual
values returned, only on the properties of the function. Namely, this
one: "For two different values t1 and t2, the probability that h(t1)
and h(t2) compare equal should be very small, approaching 1.0 /
numeric_­limits<size_­t>::max()."

Cheers,
Miguel
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <[hidden email]> wrote:
> Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are
used, among other things, in network communications, persistent
storage, etc. They need to be "fixed" functions, and their standards
provide the exact definition. That is not the case at all with
std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

degski
--
If something cannot go on forever, it will stop" - Herbert Stein

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
On Tue, Oct 23, 2018 at 12:36 PM degski <[hidden email]> wrote:

>
> On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <[hidden email]> wrote:
>>
>> > Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
>>
>> No, sorry, that is a completely different use case. Crypto hashes are
>> used, among other things, in network communications, persistent
>> storage, etc. They need to be "fixed" functions, and their standards
>> provide the exact definition. That is not the case at all with
>> std::hash or Boost.Hash.
>
>
> For debugging purposes, a fixed function seems quite useful to me.

Indeed, that is a good point! An std implementation (and Boost.Hash
too) could provide the means to fix the function for debugging (e.g.
through a #define).

Cheers,
Miguel
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list


> On Oct 23, 2018, at 10:11 AM, Miguel Ojeda via Boost-users <[hidden email]> wrote:
>
>> On Tue, Oct 23, 2018 at 12:36 PM degski <[hidden email]> wrote:
>>
>>> On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <[hidden email]> wrote:
>>>
>>>> Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.
>>>
>>> No, sorry, that is a completely different use case. Crypto hashes are
>>> used, among other things, in network communications, persistent
>>> storage, etc. They need to be "fixed" functions, and their standards
>>> provide the exact definition. That is not the case at all with
>>> std::hash or Boost.Hash.
>>
>>
>> For debugging purposes, a fixed function seems quite useful to me.
>
> Indeed, that is a good point! An std implementation (and Boost.Hash
> too) could provide the means to fix the function for debugging (e.g.
> through a #define).
>
Ok, tried few approach to get the same hashCode using boost 1.68.0 which is coming from using boost 1.53.0 ( same machine and compiler) looking into your suggested change link. Thank you, Miguel !

> Cheers,
> Miguel
> _______________________________________________
> Boost-users mailing list
> [hidden email]
> https://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
In reply to this post by Boost - Users mailing list


On Tue, 23 Oct 2018 at 12:36, degski via Boost-users <[hidden email]> wrote:
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <[hidden email]> wrote:
> Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are
used, among other things, in network communications, persistent
storage, etc. They need to be "fixed" functions, and their standards
provide the exact definition. That is not the case at all with
std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

It's already difficult enough to teach new programmers not to serialise the result of std/boost hash.

Providing a means to ensure that it's predictable would strengthen the illusion that it's predictable across compilers and architectures. This would be a grave error.

I would argue the opposite. std::hash should work hard to ensure that for any two runs of the same program, the results of a hash will be wildly different. 

This would make it easier to spot incorrect uses of it.

 R

degski
--
If something cannot go on forever, it will stop" - Herbert Stein
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On Wed, Oct 24, 2018 at 12:07 PM Shailja Prasad <[hidden email]> wrote:
>
> Ok, tried few approach to get the same hashCode using boost 1.68.0 which is coming from using boost 1.53.0 ( same machine and compiler) looking into your suggested change link. Thank you, Miguel !
>

You're welcome Shailja! I am glad it was useful :-)

Cheers,
Miguel
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
In reply to this post by Boost - Users mailing list


On Wed, Oct 24, 2018 at 12:14 PM Richard Hodges via Boost-users <[hidden email]> wrote:


On Tue, 23 Oct 2018 at 12:36, degski via Boost-users <[hidden email]> wrote:
On Tue, 23 Oct 2018 at 11:25, Miguel Ojeda <[hidden email]> wrote:
> Well, uhm, because that seems to be quite handy. All NIST implementations do exactly this.

No, sorry, that is a completely different use case. Crypto hashes are
used, among other things, in network communications, persistent
storage, etc. They need to be "fixed" functions, and their standards
provide the exact definition. That is not the case at all with
std::hash or Boost.Hash.

For debugging purposes, a fixed function seems quite useful to me.

It's already difficult enough to teach new programmers not to serialise the result of std/boost hash.

Providing a means to ensure that it's predictable would strengthen the illusion that it's predictable across compilers and architectures. This would be a grave error.

I would argue the opposite. std::hash should work hard to ensure that for any two runs of the same program, the results of a hash will be wildly different. 

This would make it easier to spot incorrect uses of it.

What about maps in shared memory? You're suggesting that it's ok that one process built with version X of boost should have no expectation of being able to operate correctly with a process build with version X+1. Insanity.

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost 1.68.0 - boost hashing changed ?

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On 10/23/18 11:25 AM, Miguel Ojeda via Boost-users wrote:

> Not only that, but stating that the hash should remain constant across
> C++/Boost releases is basically stating the hash function should be
> fixed forever. That removes all the freedom for improvements when
> future hash functions are discovered or implemented, with better
> properties (which is what happened in the commits I linked).

While I do not disagree with your arguments, we have a special situation
because the algorithm for boost::hash_combine was actually documented in
older Boost releases, including 1.53 that the OP is upgrading from, so
it would have been reasonable to assume that it stayed fixed. It is not
documented in newer releases though:

   https://lists.boost.org/Archives/boost/2014/07/215577.php

The best way to ensure that it is unchanged is to copy the old
boost::hash_combine into your own code.
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users