Add an adapter to cover fast bit functions

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Add an adapter to cover fast bit functions

Boost - Dev mailing list
Hi all, some days ago I opened a thread about DynamicBitset[1] (where I
noted that a small piece of code may double the speed of a function), which
received some interesting answers (you guys rock!)

I looked deeper and realized that Boost doesn't have functionality to do
common bit operations *really* fast.
Let me explain - there are several places in Boost where we needed to solve
typical tasks connected with bits in integers - like "get log2 of an
integer rounded down" or "get the number of trailing zeros" or "count
number of 1-bits", and Boost contributors have been inventing there a wheel
everywhere, with varying degrees of optimality: [2] - [6].

Moreover, if a programmer writes a cross-platform app and has to do smth
with this stuff, it's likely that for bit operations he will write a
"naive" implementations - e.g. getting the log2(x) for O(log N), when it
could be done for O(log log N), but with a more complicated code.

Also, as well as "smart" algos, there are other methods to use - precalced
tables, taking a small additional memory when needed [6], and intrinsics
(that are CPU-specific).
Combining of these methods (intrinsics when conveniently, or tables/algos)
gives a notable shift in performance - here's my benchmark [7] from my tiny
library [8]. Performance here varies from *9.2%* to *36.9%* of speed of
related naive functions.

I can tell you a usercase - as a hobby I'm writing a chess variants engine
(where bitsets used to represent chess board, and its size is *not* fixed
to 64), and bit operations' speed there is a matter of life and death.

So, what do you think of an adapter header in Boost, covering all these
operation of all integer types (and maybe some other one-liners like
`is_bit_set(x, pos)` ), which keeps programmers out of platform specifics?
It may look as smth like this [8] (I wrote only 3 functions, but it's
enough to show the potential).

I guess it can be placed somewhere to *Boost.Algorithm*, and supported with
CI build tests of different platforms.

P.S. The (incomplete) list of bit functions, that will give a *notable*
better performance compared with naive algos and even some "smart" (because
of precalc/intrinsics):
- count of bits
- parity of bit count
- first 1-bit
- last 1-bit (aka integer_log2(x))
- leading zeros
- trailing zeros
- swap(x) - returns x with the order of the bytes reversed; for example,
0xaabb becomes 0xbbaa. (bytes here are 8 bits exactly)
- rotate_left(x, shift)
- rotate_right(x, shift)
Other functions may look like `is_set(x, pos) -> return x & (1 << pos)` and
aren't so interesting.

[1]: https://lists.boost.org/Archives/boost/2018/08/242825.php
[2]:
https://github.com/boostorg/integer/blob/develop/include/boost/integer/integer_log2.hpp#L29-L46
[3]:
https://github.com/boostorg/move/blob/develop/include/boost/move/algo/detail/pdqsort.hpp#L98-L104
[4]:
https://github.com/boostorg/dynamic_bitset/blob/develop/include/boost/pending/lowest_bit.hpp#L23-L33
[5]:
https://github.com/boostorg/integer/blob/develop/include/boost/integer/common_factor_rt.hpp#L170-L207
(really almost the whole file)
[6]:
https://github.com/boostorg/dynamic_bitset/blob/develop/include/boost/detail/dynamic_bitset.hpp#L86-L99
[7]:
https://gist.githubusercontent.com/Izaron/8c94b76818393ffd93613d255033bb48/raw/f2fb820654baca3393e47742a5697a2e33a884b4/bench
[8]: https://github.com/Izaron/bit_algo/blob/master/include/bits.hpp

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Add an adapter to cover fast bit functions

Boost - Dev mailing list
AMDG

On 08/26/2018 05:32 PM, Evgeny Shulgin via Boost wrote:

> Hi all, some days ago I opened a thread about DynamicBitset[1] (where I
> noted that a small piece of code may double the speed of a function), which
> received some interesting answers (you guys rock!)
>
> I looked deeper and realized that Boost doesn't have functionality to do
> common bit operations *really* fast.<snip>
>
> So, what do you think of an adapter header in Boost, covering all these
> operation of all integer types (and maybe some other one-liners like
> `is_bit_set(x, pos)` ), which keeps programmers out of platform specifics?
> It may look as smth like this [8] (I wrote only 3 functions, but it's
> enough to show the potential).
>

- Please don't #include <iostream> in headers, if you can avoid it.
- You're using C++14 constexpr, so BOOST_CXX14_CONSTEXPR
  would be the correct macro.
- You might need to think about whether/how to support
  signed integers.  In particular, I'm thinking about
  swap and rotate.
- For unit tests, it's better to use a prng instead of random_device.
  non-deterministic tests make it more difficult to reproduce
  test failures.
- It's better to separate benchmarks from unit tests.  In particular,
  storing the elements in a vector is necessary for benchmarking,
  since you don't want to benchmark the generation of the
  values, but for tests, it artificially limits the maximum
  number of values you can check.
- For char and short, testing every possible input is easy.
  You can do int, too, if you're willing to wait a bit for
  it to finish.
- s/runned/ran/

> I guess it can be placed somewhere to *Boost.Algorithm*, and supported with
> CI build tests of different platforms.
>

Or Boost.Integer.

> P.S. The (incomplete) list of bit functions, that will give a *notable*
> better performance compared with naive algos and even some "smart" (because
> of precalc/intrinsics):
> - count of bits
> - parity of bit count
> - first 1-bit
> - last 1-bit (aka integer_log2(x))
> - leading zeros
> - trailing zeros
> - swap(x) - returns x with the order of the bytes reversed; for example,
> 0xaabb becomes 0xbbaa. (bytes here are 8 bits exactly)
> - rotate_left(x, shift)
> - rotate_right(x, shift)
> Other functions may look like `is_set(x, pos) -> return x & (1 << pos)` and
> aren't so interesting.
>

In Christ,
Steven Watanabe

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Add an adapter to cover fast bit functions

Boost - Dev mailing list
Thank you for your reply! It really helped me a lot.

> - Please don't #include <iostream> in headers, if you can avoid it.
My bad, I forgot to remove this huge unused header, I should've used a tool
like include-what-you-use.

> - You're using C++14 constexpr, so BOOST_CXX14_CONSTEXPR
>   would be the correct macro.
Thanks, I missed the fact that in C++11 it is correct only when the function
has exactly one return

> - You might need to think about whether/how to support
>  signed integers. In particular, I'm thinking about
>  swap and rotate.
I guess "rotate" should not affect the sign bit. If I remember correctly,
there are commands in Assembler to rotate signed numbers.
"Swap" works with whole bytes and will implicitly take a number as an
unsigned. But it's a theme to discuss, as for other functions.

It's good to check signed types in the benchmark as well.

> - For unit tests, it's better to use a prng instead of random_device.
>   non-deterministic tests make it more difficult to reproduce
>   test failures.
I'll change it, sounds good.

> - It's better to separate benchmarks from unit tests. In particular,
>   storing the elements in a vector is necessary for benchmarking,
>   since you don't want to benchmark the generation of the
>   values, but for tests, it artificially limits the maximum
>   number of values you can check.
Values generation isn't benchmarked, only working time of functions.

I was thinking of separating the whole stuff in this way:
- a single unit test verifies the correctness of the algorithm comparing it
with naive functions on generated values
- the benchmark file isn't a test really, but could be used in CI to produce
information about performance
  if it's possible (though I guess it's be possible)

Also, IMO, there should be a tiny single header for _every_ function, since
a large file with a ton of monotonic specializations
is pretty hard to support.

> - For char and short, testing every possible input is easy.
>   You can do int, too, if you're willing to wait a bit for
>   it to finish.
I think it's too much to check every int32 value, because it takes really a
lot of time for almost nothing. We can use many generated values
with corner cases like the maximal/minimal value of each type. The idea is
fine.

> Or Boost.Integer.
I explored this, it looks more appropriate to insert fresh code. I hope
someone would take a look at PR if I make the code look
really much better and send it.



--
Sent from: http://boost.2283326.n4.nabble.com/Boost-Dev-f2600599.html

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Add an adapter to cover fast bit functions

Boost - Dev mailing list
On Tue, 28 Aug 2018 at 02:23, Evgeny Shulgin [Izaron] via Boost <
[hidden email]> wrote:

> > - For unit tests, it's better to use a prng instead of random_device.
> >   non-deterministic tests make it more difficult to reproduce
> >   test failures.
> I'll change it, sounds good.
>

 Yes, it sound good, but is it? I've been burned more than once by
developing and testing a feature for something, thinking all was good and
then remarkably stuff did not work in the real world. Ah, another seed
would have caught this bug. I think some fuzzy testing is good, and you
said it yourself (and the advice), the test has to be re-producible.
Marrying the two is not impossible, but needs some thought, design and some
more code.

degski
--
*“If something cannot go on forever, it will stop" - Herbert Stein*
*“No, it isn’t truth. Truth isn’t truth" - Rudolph W. L. Giuliani*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Add an adapter to cover fast bit functions

Boost - Dev mailing list
Hi,

Am 28.08.2018 06:12, schrieb degski via Boost:

> On Tue, 28 Aug 2018 at 02:23, Evgeny Shulgin [Izaron] via Boost <
> [hidden email]> wrote:
>
>> > - For unit tests, it's better to use a prng instead of random_device.
>> >   non-deterministic tests make it more difficult to reproduce
>> >   test failures.
>> I'll change it, sounds good.
>>
>
>  Yes, it sound good, but is it? I've been burned more than once by
> developing and testing a feature for something, thinking all was good
> and
> then remarkably stuff did not work in the real world. Ah, another seed
> would have caught this bug. I think some fuzzy testing is good, and you
> said it yourself (and the advice), the test has to be re-producible.
> Marrying the two is not impossible, but needs some thought, design and
> some
> more code.

Unit Tests are not the best place for fuzzing. You should do both. Unit
test
should be deterministic and always reproduce the same result, because
you'll
use them for debugging. Fuzzing on the other hand is used to find
unknown
bugs. When you found one, create a unit test to reproduce it, fix the
bug,
so that the unit test doesn't fail any more and continue fuzzing.

Christof

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Add an adapter to cover fast bit functions

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Then, if we don't want to fix the values once and for all, we can still use
generating of 3-5mln of truly random values
for each large integral type (still with corner cases), but when comparing
naive functions (that are 100% working) and
fast ones (from the library), we can do something like this:

Instead of:
> BOOST_ASSERT(naive_outcome == fast_outcome);
Use this:
> if (naive_outcome != fast_outcome)
> {
>     // Or std::cerr or printf
>     std::cout << "LOOK AT ME: func_name failed on " << number
>         << " real outcome " << naive_outcome
>         << " library outcome " << fast_outcome << std::endl;
>     BOOST_ASSERT(naive_outcome == fast_outcome);
> }
CI will run this on different machines using randomness, and it is more
likely to catch a bug, looking to its job log.



--
Sent from: http://boost.2283326.n4.nabble.com/Boost-Dev-f2600599.html

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Add an adapter to cover fast bit functions

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, 28 Aug 2018 at 16:02, Christof Donat via Boost <
[hidden email]> wrote:

> Unit Tests are not the best place for fuzzing. You should do both. Unit
> test
> should be deterministic and always reproduce the same result, because
> you'll
> use them for debugging. Fuzzing on the other hand is used to find
> unknown
> bugs. When you found one, create a unit test to reproduce it, fix the
> bug,
> so that the unit test doesn't fail any more and continue fuzzing.
>

Thanks for not discarding it completely. How to do this, I think, I left in
the middle, I just said more thought, more design and more code.

degski
--
*“If something cannot go on forever, it will stop" - Herbert Stein*
*“No, it isn’t truth. Truth isn’t truth" - Rudolph W. L. Giuliani*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost