Formal Review of Proposed Boost.Histogram Library Starts TODAY

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
Dear Boost Community,

The formal review of Hans Dembinski's Histogram library starts TODAY,
the September 17th, and lasts until September 26th,
unless an extension occurs.

===========
What is it?
===========

Histogram is a header-only C++11 library, provides multi-dimensional histogram
template class for counting, statistics and data exploration.

The histogram class interface has been carefully designed to make the library

* offer simple and convenient default behaviour
* easy to learn and safe to use
* convenient to use for one- as well as multi-dimensional analysis
* compatible with STL algorithms
* dependant only on C++11 standard library and optionally on Boost

The author followed the "Don't pay for what you don't use" principle.

The histogram class is customisable through policy classes, but the default
policies were carefully crafted so that most users do not need to customize
anything.

The default policies guarantee the histogram to be safe to use as a black box,
memory efficient, and very fast, where safety is related to overflow
prevention and large values capping. This guarantees is one of unique
features of the library in comparison to other implementations.

The histogram class comes in two variants, which share a common interface:

* static variant is optimised at compile-time to provide maximum performance,
  at the cost of reduced runtime flexibility and
  potentially larger executables;

* dynamic variant is a bit slower, but configurable at run-time.

Both variants of the histogram class provide optional serialization support
implemented with Boost.Serialization.

The library itself is header-only and the tests as well as examples can be
built using both, Boost.Build and CMake.
There are no dependencies required other than C++11 compiler.
Optional dependencies include the GNU Scientific Library (GSL) and the
ROOT framework from CERN which are needed the corresponding speed test
programs used for performance comparison.

A note on history:
The library was first announced to the Boost community in May, 2016 [1]
and the author received extensive feedback. Then, after a year of work
posted second request for comments in April, 2017 [2].

===================
Getting the library
===================

Checkout the `master` branch from the library git repository:

https://github.com/HDembinski/histogram/

NOTE: The author follows the library submission process [3]:
the master branch remains frozen for the period of the formal review,
modifications in response to feedback should go to `develop` branch
and big items may be worked and discussed via opening an issue on GitHub.

Documentation: http://hdembinski.github.io/histogram/doc/html/

Website: http://hdembinski.github.io/histogram/

================
Writing a review
================

Please, follow the official Boost formal review procedure [4].

If you feel this is an interesting library, then please submit your
review to the Boost developer list (preferably),
or the review manager to mateusz (at) loskot (dot) net.

Here are some questions you might want to answer in your review:

- What is your evaluation of the design?
- What is your evaluation of the implementation?
- What is your evaluation of the documentation?
- What is your evaluation of the potential usefulness of the library?
- Did you try to use the library? With what compiler?
  Did you have any problems?
- How much effort did you put into your evaluation?
  A glance? A quick reading? In-depth study?
- Are you knowledgeable about the problem domain?

And finally, every review should answer this question:

- Do you think the library should be accepted as a Boost library?

Be sure to say this explicitly so that your other comments don't
obscure your overall opinion.

If needed, refer to the Boost Library Requirements and Guidelines [5].

Please, give constructive criticism!


[1] https://lists.boost.org/Archives/boost/2016/05/229389.php
[2] https://lists.boost.org/Archives/boost/2017/04/234410.php
[3] https://www.boost.org/development/submissions.html#Review
[4] https://www.boost.org/community/reviews.html
[5] https://www.boost.org/development/requirements.html

Mateusz
Review Manager for the proposed Boost.Histogram

--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
AMDG

abstract.html:

- "the one- and multi-dimensional case"
  "cases" should be plural.

- "..submit this project to Boost, that's why..."
  Comma splice

- Please don't overuse *bold text*

acknowledgments.html

- "converting the documentation, and adding Jamfiles"
  No comma here.

motivation.html

- "The GNU Scientific Library (GSL) and in the ROOT framework"
  Remove "in"

- "The C implementations of the GSL are elegant..."
  Are there multiple implementations?

- "The implementations are not customizable, you have to..."
  Comma splice

build.html

- Is there a reason that CMakeLists.txt is in
  build/ instead of test/?

getting_started.html

- "int main(int, char**) {"
  If you're not using the parameters, `int main()` is a
  valid signature.

- Since you require C++11, you might as well use <random> instead
  of Boost.Random.  The Boost implementation of mt19937 has better
  performance, but that doesn't really matter here.

- "in addition, the factory also accepts iterators over a sequence of
  axis::any, the polymorphic type that can hold concrete axis types"
  The relationship between axis::any and axis::any_std
  (the type actually used) in unclear.

guide.html:

- "A histogram consists a number of..."
  "consists /of/ a number of..."

- "which provides safe counting, is fast and memory efficient."
  There are only two elements, so use "and" instead of a comma
  or say "is fast, and /is/ memory efficient."

- "Axes objects with different label do..." ->
  s/Axes/Axis/, s/label/labels/

- "...sequence of bin indices for each axes in order."
  s/axes/axis/

- I wonder whether you could handle the issue that
  h * 2 != h + h, by using h * weight(2) to make it
  clear that multiplication changes the weights of
  the samples instead of changing the number of samples.

- "...remove some axes and look the equivalent lower..."
  "...look /at/ the equivalent..."

- "so it is not worth keeping that axies around"
  s/axies/axis/

- "Users can create new axis classes ..., or implemented..."
  s/implemented/implement/

- "Here is a contrieved example of a..."
  s/contrieved/contrived/

- "The guarantees ... are only valid, if the default..."
  No comma.

- "... you need know what you are doing"
  "you need /to/ know"

benchmarks.html:

- No GSL?

- You say uniform and normal distributions, but there's
  only a single plot.  Does this mean that you are putting
  both uniform and normal variates into a single histogram?

rationale.html

- "Axis access"
  Not the most fortuitous word combination.

- "The buildin storage types..."
  s/buildin/builtin/

- It seems that using weighted fills breaks the special
  overflow handling properties of adaptive_storage.  I
  think that this deserves a distinct note.

- "Why is Boost.Histogram not build on top..."
  s/build/built/

concepts.html:

- Concept names use CamelCase, by convention.

- It doesn't seem very sane to have the value_type
  be a reference.  Is there any reason for this
  other than the fact that you specify the exact
  signature of axis_type::index?

- I would prefer to set the size of a storage_type
  in the constructor instead of using reset, if
  that's possible.

- I don't understand the difference between the two
  overloads of storage_type::add.

reference.html:

- Unfortunately forward declarations seem to mess
  up the doxygen/boostbook toolchain.  As a result
  a number of components are listed in histogram_fwd.hpp
  I'll look into this.

- I'll deal with the reference content along with the
  code.  I will note that the reference documentation
  seems a bit lacking overall as there are many components
  without any description.

General:

- The passive voice is overused.

In Christ,
Steven Watanabe


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 17/09/2018 21:06, Mateusz Loskot wrote:
> The formal review of Hans Dembinski's Histogram library starts TODAY,
[...]
> The author followed the "Don't pay for what you don't use" principle.
[...]
> Documentation: http://hdembinski.github.io/histogram/doc/html/

Perhaps this idea falls into that principle, but there seems to be an
inherent assumption in the axis classes that a transform will always
return the same type as its input.

(For example, the min_ and delta_ fields in the regular axis are stored
as value_type, as are the input arguments lower and upper.)

It seems like it should be trivial to "fix" that, though, by storing
these already-transformed values as type transformed_type instead.

     using transformed_type =
decltype(std::declval<transform_type>().forward(std::declval<value_type>()));

Or similar.  This should in theory allow an arbitrarily complex
value_type to be used, provided that a suitable transform can provide
bidirectional conversion to a simpler numeric type.


On a related note, the method used to store the transform as a base
class of the axis bothers me a bit, in particular that it calls the copy
constructor to init the base rather than the move constructor, and that
the init of min_ and delta_ (for axis::regular) are invoked on the
original copy instead of the internal copy which is subsequently used
for all other operations.

I'm not 100% clear on the standards guarantees for the constructor
initialiser list (in particular if a base initialiser is guaranteed to
be fully constructed before the arguments to other initialisers are
evaluated or not), so it's possible that this is UB if that guarantee is
not met, but I would have thought it'd be more sensible to init as:

     regular(...)
         : transform_type(std::move(trans)
         , min_(transform_type::forward(lower))
         , ...

Such that the same instance is used in all cases.


Also: transforms don't seem to be mentioned much in the documentation.
I couldn't find any documented requirements to build your own, for
example, though I worked it out from looking at the source.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
>
> Perhaps this idea falls into that principle, but there seems to be an
> inherent assumption in the axis classes that a transform will always
> return the same type as its input.
>

True, there is an inherent assumption that the transform does not change
the value_type and only do a monotonic mapping, like log(x) or pow(x,
const). But like you point out, this restriction is not necessary and it is
easy to avoid. But just out of curiosity, can you name a use case where the
transformed type is different from the value type?

I'm not 100% clear on the standards guarantees for the constructor

> initialiser list (in particular if a base initialiser is guaranteed to
> be fully constructed before the arguments to other initialisers are
> evaluated or not), so it's possible that this is UB if that guarantee is
> not met, but I would have thought it'd be more sensible to init as:
>
>      regular(...)
>          : transform_type(std::move(trans)
>          , min_(transform_type::forward(lower))
>          , ...
>
> Such that the same instance is used in all cases.


AFAIK, the order in which these statements are executed is not guaranteed,
that's why the ctor uses trans instead of transform_type. I am happy to be
proven wrong by a language expert. If that's the case, I agree that your
suggestion makes more sense.

Also: transforms don't seem to be mentioned much in the documentation.
> I couldn't find any documented requirements to build your own, for
> example, though I worked it out from looking at the source.
>

True, transforms were added late, I simply forgot to document them
properly. I added an issue to that effect.

 Kind regards,
Hans

On Tue, Sep 18, 2018 at 10:04 AM Gavin Lambert via Boost <
[hidden email]> wrote:

> On 17/09/2018 21:06, Mateusz Loskot wrote:
> > The formal review of Hans Dembinski's Histogram library starts TODAY,
> [...]
> > The author followed the "Don't pay for what you don't use" principle.
> [...]
> > Documentation: http://hdembinski.github.io/histogram/doc/html/
>
> Perhaps this idea falls into that principle, but there seems to be an
> inherent assumption in the axis classes that a transform will always
> return the same type as its input.
>
> (For example, the min_ and delta_ fields in the regular axis are stored
> as value_type, as are the input arguments lower and upper.)
>
> It seems like it should be trivial to "fix" that, though, by storing
> these already-transformed values as type transformed_type instead.
>
>      using transformed_type =
>
> decltype(std::declval<transform_type>().forward(std::declval<value_type>()));
>
> Or similar.  This should in theory allow an arbitrarily complex
> value_type to be used, provided that a suitable transform can provide
> bidirectional conversion to a simpler numeric type.
>
>
> On a related note, the method used to store the transform as a base
> class of the axis bothers me a bit, in particular that it calls the copy
> constructor to init the base rather than the move constructor, and that
> the init of min_ and delta_ (for axis::regular) are invoked on the
> original copy instead of the internal copy which is subsequently used
> for all other operations.
>
> I'm not 100% clear on the standards guarantees for the constructor
> initialiser list (in particular if a base initialiser is guaranteed to
> be fully constructed before the arguments to other initialisers are
> evaluated or not), so it's possible that this is UB if that guarantee is
> not met, but I would have thought it'd be more sensible to init as:
>
>      regular(...)
>          : transform_type(std::move(trans)
>          , min_(transform_type::forward(lower))
>          , ...
>
> Such that the same instance is used in all cases.
>
>
> Also: transforms don't seem to be mentioned much in the documentation.
> I couldn't find any documented requirements to build your own, for
> example, though I worked it out from looking at the source.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
On Tue, 18 Sep 2018 at 20:56, Hans Dembinski via Boost
<[hidden email]> wrote:
>
> Also: transforms don't seem to be mentioned much in the documentation.
> > I couldn't find any documented requirements to build your own, for
> > example, though I worked it out from looking at the source.
> >
>
> True, transforms were added late, I simply forgot to document them
> properly. I added an issue to that effect.

That is, for the record and easy follow-up
https://github.com/HDembinski/histogram/issues/90

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
AMDG

On 09/18/2018 12:55 PM, Hans Dembinski via Boost wrote:

>>
>> <snip>
>> I'm not 100% clear on the standards guarantees for the constructor
>> initialiser list (in particular if a base initialiser is guaranteed to
>> be fully constructed before the arguments to other initialisers are
>> evaluated or not), so it's possible that this is UB if that guarantee is
>> not met, but I would have thought it'd be more sensible to init as:
>>
>>      regular(...)
>>          : transform_type(std::move(trans)
>>          , min_(transform_type::forward(lower))
>>          , ...
>>
>> Such that the same instance is used in all cases.
>
>
> AFAIK, the order in which these statements are executed is not guaranteed,
> that's why the ctor uses trans instead of transform_type. I am happy to be
> proven wrong by a language expert. If that's the case, I agree that your
> suggestion makes more sense.
>

Initializers are never interleaved.

"In a non-delegating constructor, initialization proceeds in the
following order:
- ...virtual base classes ...
- Then, direct base classes ...
- Then, non-static data members ..." [class.base.init]

"The initialization performed by each mem-initializer constitutes
a full-expression" [class.base.init]

"Every value computation and side effect associated
with a full-expression is sequenced before every
value computation and side effect associated with
the next full-expression to be evaluated" [intro.execution]

In Christ,
Steven Watanabe

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Dear Steven,

thank you very much for the careful reading of the docs and sorry for the
overuse of bold text. Now that people actually notice the project I can
stop shouting. I will implement all these wording mistakes that you point
out. A few clarifcations below:

abstract.html:
>
> - "The C implementations of the GSL are elegant..."
>   Are there multiple implementations?
>

Ok, one implementation per set, but there are two sets of functions, one
set for 1d histograms and one set for 2d histograms.

- Is there a reason that CMakeLists.txt is in
>   build/ instead of test/?
>

Mateusz moved the CMakeLists.txt into the root directory which is
better/more straight forward and follows the example of hana and compute.
We could also move it to test as you suggest, but I prefer it in the root
directory.
I read in some Boost guides (I think it was on the blincubator, but I can't
find the source just now and blincubator.com gives me a 403 error today)
that all build scripts should reside in the build directory or next to the
source. So I put it in "build".

- "in addition, the factory also accepts iterators over a sequence of
>   axis::any, the polymorphic type that can hold concrete axis types"
>   The relationship between axis::any and axis::any_std
>   (the type actually used) in unclear.
>

True, this is not explained well. axis::any is a closed-set polymorphic
type derived from boost::variant and templated on a list of types that the
polymorphic type should be able to represent. axis::any_std is a concrete
type where the type list is filled with standard axis types which users are
expected to use most frequently.


> - I wonder whether you could handle the issue that
>   h * 2 != h + h, by using h * weight(2) to make it
>   clear that multiplication changes the weights of
>   the samples instead of changing the number of samples.
>

If I understand correctly, you would only allow scaling by a weight type,
not by an arbitrary number. That makes sense and is more explicit. Sounds
like a good idea.


> benchmarks.html:
>
> - No GSL?
>

There are GSL benchmarks for 1D and 2D, where our static variants are
faster. The GSL does not provide histograms for more than two dimensions.
https://www.gnu.org/software/gsl/manual/html_node/Histograms.html


> - You say uniform and normal distributions, but there's
>   only a single plot.  Does this mean that you are putting
>   both uniform and normal variates into a single histogram?
>

Good point, this is a mistake in my plotting script. I originally wanted to
plot both results but ended up plotting only one. The benchmarks for
uniform and normal data are slightly different, because they trigger the
branches in the axis code differently. By design of the benchmark, branch
prediction works better for the uniform data.


> - "Axis access"
>   Not the most fortuitous word combination.
>

As long as you don't try to say it out loud... ;)

- It seems that using weighted fills breaks the special
>   overflow handling properties of adaptive_storage.  I
>   think that this deserves a distinct note.
>

Right.


> - It doesn't seem very sane to have the value_type
>   be a reference.  Is there any reason for this
>   other than the fact that you specify the exact
>   signature of axis_type::index?
>

This is a mistake, the method is not required to accept a reference,
pass-by-value is also possible. How should I write it to indicate that
passing by value or reference is possible?


> - I would prefer to set the size of a storage_type
>   in the constructor instead of using reset, if
>   that's possible.
>

I was flip-flopping on that one. A call `value.reset(n)` can be replaced by
`value = storage_type(n)`. I don't see a good reason to prefer one over the
other, do you?


> - I don't understand the difference between the two
>   overloads of storage_type::add.
>

I should explain better, the whole section about concepts is very brief.
The first version of "add" is used when you add two histograms and integral
counts are added to integral counts. The second version of add - which
accepts the weight type - is used when you increment the counter not by 1,
but by a weight. In an earlier stage of this library, the corresponding
method was called `increment_by_weight`. The second overload is required
because it causes an internal transformation from integral counters into
dual counts which track the sum of weights and the sum of weights squared
in case of the adaptive_storage (so that a variance estimate can be
computed). This transformation is not needed when you simply add two
histograms.


> - Unfortunately forward declarations seem to mess
>   up the doxygen/boostbook toolchain.  As a result
>   a number of components are listed in histogram_fwd.hpp
>   I'll look into this.
>

Thanks, I am glad for any help on this issue.


> - I'll deal with the reference content along with the
>   code.  I will note that the reference documentation
>   seems a bit lacking overall as there are many components
>   without any description.
>

True, just like the concepts it got a bit less love.


> - The passive voice is overused.
>

And in the last paper that I submitted to a journal, they complained about
my use of active voice...

On Mon, Sep 17, 2018 at 10:58 PM Steven Watanabe via Boost <
[hidden email]> wrote:

> AMDG
>
> abstract.html:
>
> - "the one- and multi-dimensional case"
>   "cases" should be plural.
>
> - "..submit this project to Boost, that's why..."
>   Comma splice
>
> - Please don't overuse *bold text*
>
> acknowledgments.html
>
> - "converting the documentation, and adding Jamfiles"
>   No comma here.
>
> motivation.html
>
> - "The GNU Scientific Library (GSL) and in the ROOT framework"
>   Remove "in"
>
> - "The C implementations of the GSL are elegant..."
>   Are there multiple implementations?
>
> - "The implementations are not customizable, you have to..."
>   Comma splice
>
> build.html
>
> - Is there a reason that CMakeLists.txt is in
>   build/ instead of test/?
>
> getting_started.html
>
> - "int main(int, char**) {"
>   If you're not using the parameters, `int main()` is a
>   valid signature.
>
> - Since you require C++11, you might as well use <random> instead
>   of Boost.Random.  The Boost implementation of mt19937 has better
>   performance, but that doesn't really matter here.
>
> - "in addition, the factory also accepts iterators over a sequence of
>   axis::any, the polymorphic type that can hold concrete axis types"
>   The relationship between axis::any and axis::any_std
>   (the type actually used) in unclear.
>
> guide.html:
>
> - "A histogram consists a number of..."
>   "consists /of/ a number of..."
>
> - "which provides safe counting, is fast and memory efficient."
>   There are only two elements, so use "and" instead of a comma
>   or say "is fast, and /is/ memory efficient."
>
> - "Axes objects with different label do..." ->
>   s/Axes/Axis/, s/label/labels/
>
> - "...sequence of bin indices for each axes in order."
>   s/axes/axis/
>
> - I wonder whether you could handle the issue that
>   h * 2 != h + h, by using h * weight(2) to make it
>   clear that multiplication changes the weights of
>   the samples instead of changing the number of samples.
>
> - "...remove some axes and look the equivalent lower..."
>   "...look /at/ the equivalent..."
>
> - "so it is not worth keeping that axies around"
>   s/axies/axis/
>
> - "Users can create new axis classes ..., or implemented..."
>   s/implemented/implement/
>
> - "Here is a contrieved example of a..."
>   s/contrieved/contrived/
>
> - "The guarantees ... are only valid, if the default..."
>   No comma.
>
> - "... you need know what you are doing"
>   "you need /to/ know"
>
> benchmarks.html:
>
> - No GSL?
>
> - You say uniform and normal distributions, but there's
>   only a single plot.  Does this mean that you are putting
>   both uniform and normal variates into a single histogram?
>
> rationale.html
>
> - "Axis access"
>   Not the most fortuitous word combination.
>
> - "The buildin storage types..."
>   s/buildin/builtin/
>
> - It seems that using weighted fills breaks the special
>   overflow handling properties of adaptive_storage.  I
>   think that this deserves a distinct note.
>
> - "Why is Boost.Histogram not build on top..."
>   s/build/built/
>
> concepts.html:
>
> - Concept names use CamelCase, by convention.
>
> - It doesn't seem very sane to have the value_type
>   be a reference.  Is there any reason for this
>   other than the fact that you specify the exact
>   signature of axis_type::index?
>
> - I would prefer to set the size of a storage_type
>   in the constructor instead of using reset, if
>   that's possible.
>
> - I don't understand the difference between the two
>   overloads of storage_type::add.
>
> reference.html:
>
> - Unfortunately forward declarations seem to mess
>   up the doxygen/boostbook toolchain.  As a result
>   a number of components are listed in histogram_fwd.hpp
>   I'll look into this.
>
> - I'll deal with the reference content along with the
>   code.  I will note that the reference documentation
>   seems a bit lacking overall as there are many components
>   without any description.
>
> General:
>
> - The passive voice is overused.
>
> In Christ,
> Steven Watanabe
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
AMDG

On 09/18/2018 02:04 PM, Hans Dembinski via Boost wrote:

> <snip lots>
>> - Is there a reason that CMakeLists.txt is in
>>   build/ instead of test/?
>>
>
> Mateusz moved the CMakeLists.txt into the root directory which is
> better/more straight forward and follows the example of hana and compute.
> We could also move it to test as you suggest, but I prefer it in the root
> directory.
> I read in some Boost guides (I think it was on the blincubator, but I can't
> find the source just now and blincubator.com gives me a 403 error today)
> that all build scripts should reside in the build directory or next to the
> source. So I put it in "build".
>

  That's not the correct interpretation of the guideline
(perhaps it needs to be stated more clearly as you're
not the first to make that mistake).  build/ is specifically
for the scripts that build a separately compiled library,
which you do not need.  Note that the corresponding Jamfile,
which does essentially the same thing, lives in test/ and
this is required for integration.  I would probably put
CMakeLists.txt in test/ with an (optional) root CMakeLists.txt
that just calls add_subdirectory.

>> - It doesn't seem very sane to have the value_type
>>   be a reference.  Is there any reason for this
>>   other than the fact that you specify the exact
>>   signature of axis_type::index?
>>
>
> This is a mistake, the method is not required to accept a reference,
> pass-by-value is also possible. How should I write it to indicate that
> passing by value or reference is possible?
>

  I would probably handle it with an "as if" clause.
That is to say, concrete implementations need
to handle every usage that is legal for pass-by-value,
but do not necessarily need to actually use pass-by-value.
(To be strictly formally correct, this may require
additional restrictions like forbidding usage via a
pointer-to-member-function, which can distinguish
the exact argument type.)

>
>> - I would prefer to set the size of a storage_type
>>   in the constructor instead of using reset, if
>>   that's possible.
>>
>
> I was flip-flopping on that one. A call `value.reset(n)` can be replaced by
> `value = storage_type(n)`. I don't see a good reason to prefer one over the
> other, do you?
>

  After starting to look at the source, I think
reset, as you have it, is better, because it
allows the storage_type to carry additional
state safely (e.g. an allocator).

>
>> - The passive voice is overused.
>>
>
> And in the last paper that I submitted to a journal, they complained about
> my use of active voice...
>

  I did also notice that you used the second person
quite a bit, which gives a somewhat informal,
conversational feel.  This is fine for tutorials
but may not be appropriate in all circumstances.

In Christ,
Steven Watanabe

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 19/09/2018 06:55, Hans Dembinski wrote:
> True, there is an inherent assumption that the transform does not change
> the value_type and only do a monotonic mapping, like log(x) or pow(x,
> const). But like you point out, this restriction is not necessary and it is
> easy to avoid. But just out of curiosity, can you name a use case where the
> transformed type is different from the value type?

One trivial case is where the input is integers (or an
arbitrary-precision bigint type of some sort) but for whatever reason
you want to get a histogram of the log or square root or something else
floating-point.

Another case might be where you want to use a larger structure as your
input and use the transform to extract one particular field.  Although
this is probably less useful since it's a lossy transform and the
inverse wouldn't be able to produce the full original object -- unless
perhaps the transform was stateful and preserved its input (which
doesn't make much sense as presumably it's non-unique, otherwise you
wouldn't be using a histogram), or you don't care about the other fields
in the structure when producing the inverse, or you don't need to make
any calls which require calculating the inverse.

> AFAIK, the order in which these statements are executed is not guaranteed,
> that's why the ctor uses trans instead of transform_type. I am happy to be
> proven wrong by a language expert. If that's the case, I agree that your
> suggestion makes more sense.

It would require a special kind of crazy for a compiler to emit calls to
an instance method of a base class before calling the base class
constructor.  So I'm reasonably positive that it's safe.

Steven's already addressed this as well, which seems to agree.  Although
I must admit a little surprise; I thought I remembered that even though
the actual initialisation of members occurs in the defined order,
evaluation of arguments to the initialisers might not be.  Perhaps this
was something tightened in C++11?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
AMDG

On 09/18/2018 05:17 PM, Gavin Lambert via Boost wrote:
> <snip>
> Steven's already addressed this as well, which seems to agree.  Although
> I must admit a little surprise; I thought I remembered that even though
> the actual initialisation of members occurs in the defined order,
> evaluation of arguments to the initialisers might not be.  Perhaps this
> was something tightened in C++11?
>

It was always that way:

"There is a sequence point (1.9) after the initialization
of each base and member. The expression-list of a
mem-initializer is evaluated as part of the initialization
of the corresponding base or member." [C++03]

In Christ,
Steven Watanabe

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
On 19/09/2018 11:26, Steven Watanabe wrote:

> On 09/18/2018 05:17 PM, Gavin Lambert wrote:
>> Steven's already addressed this as well, which seems to agree.  Although
>> I must admit a little surprise; I thought I remembered that even though
>> the actual initialisation of members occurs in the defined order,
>> evaluation of arguments to the initialisers might not be.  Perhaps this
>> was something tightened in C++11?
>
> It was always that way:
>
> "There is a sequence point (1.9) after the initialization
> of each base and member. The expression-list of a
> mem-initializer is evaluated as part of the initialization
> of the corresponding base or member." [C++03]

Correct me if I'm wrong, but that text does not appear in C++98.  So
it's not quite "always". :)

Although C++98 does still require that base constructors are called
first [class.base.init#5] and that it is legal to call instance methods
in the initialisers for members (but not legal in the base constructor
call itself) [class.base.init#8].  So that part has probably always been
safe.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

> On 18. Sep 2018, at 22:54, Steven Watanabe via Boost <[hidden email]> wrote:
>
>  That's not the correct interpretation of the guideline
> (perhaps it needs to be stated more clearly as you're
> not the first to make that mistake).  build/ is specifically
> for the scripts that build a separately compiled library,
> which you do not need.  Note that the corresponding Jamfile,
> which does essentially the same thing, lives in test/ and
> this is required for integration.  I would probably put
> CMakeLists.txt in test/ with an (optional) root CMakeLists.txt
> that just calls add_subdirectory.

Ok, Mateusz actually suggested the same thing.

It was not a mistake originally. Until Saturday, a separately compiled library was part of histogram, the Python module which has been removed, because we concluded that the Python frontend is better handled as a separate project.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

> On 19. Sep 2018, at 01:17, Gavin Lambert via Boost <[hidden email]> wrote:
>
> One trivial case is where the input is integers (or an arbitrary-precision bigint type of some sort) but for whatever reason you want to get a histogram of the log or square root or something else floating-point.

Ok, good point. I have another practical use case. I think axis::regular should work with Boost.Units, so you can make an axis that accepts only energies. The transform in this case would strip off the unit to produce a dimensionless scale which is then binned.

https://github.com/HDembinski/histogram/issues/92

> Another case might be where you want to use a larger structure as your input and use the transform to extract one particular field.  Although this is probably less useful since it's a lossy transform and the inverse wouldn't be able to produce the full original object -- unless perhaps the transform was stateful and preserved its input (which doesn't make much sense as presumably it's non-unique, otherwise you wouldn't be using a histogram), or you don't care about the other fields in the structure when producing the inverse, or you don't need to make any calls which require calculating the inverse.
>
>> AFAIK, the order in which these statements are executed is not guaranteed,
>> that's why the ctor uses trans instead of transform_type. I am happy to be
>> proven wrong by a language expert. If that's the case, I agree that your
>> suggestion makes more sense.
>
> It would require a special kind of crazy for a compiler to emit calls to an instance method of a base class before calling the base class constructor.  So I'm reasonably positive that it's safe.
>
> Steven's already addressed this as well, which seems to agree.  Although I must admit a little surprise; I thought I remembered that even though the actual initialisation of members occurs in the defined order, evaluation of arguments to the initialisers might not be.  Perhaps this was something tightened in C++11?

I think my mistake was the following. When you call a function or a constructor, the order in which the arguments are evaluated is not guaranteed by the standard. But this here is a different case.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

build directory guidelines (was: [histogram] review part 1 (documentation))

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, 18 Sep 2018 at 22:54, Steven Watanabe via Boost
<[hidden email]> wrote:

> On 09/18/2018 02:04 PM, Hans Dembinski via Boost wrote:
> > <snip lots>
> >> - Is there a reason that CMakeLists.txt is in
> >>   build/ instead of test/?
> >>
> >
> > Mateusz moved the CMakeLists.txt into the root directory which is
> > better/more straight forward and follows the example of hana and compute.
> > We could also move it to test as you suggest, but I prefer it in the root
> > directory.
> > I read in some Boost guides (I think it was on the blincubator, but I can't
> > find the source just now and blincubator.com gives me a 403 error today)
> > that all build scripts should reside in the build directory or next to the
> > source. So I put it in "build".
> >
>
>   That's not the correct interpretation of the guideline
> (perhaps it needs to be stated more clearly as you're
> not the first to make that mistake).

I believe it deserves clarification

https://www.boost.org/development/requirements.html says

- build
- Library build files such as a Jamfile, IDE projects, Makefiles,
Cmake files, etc.
- Required if the library has sources to build.

It is unclear if the "sources to build" are meant strictly as the library
compilation input sources only, or also auxiliary sources of the library
as a project (ie. tests, examples).

> I would probably put
> CMakeLists.txt in test/ with an (optional) root CMakeLists.txt
> that just calls add_subdirectory.

I tend to put the root CMakeLists.txt
- it is convenient to drive build of everything in the library
- it indicates if the library is CMake-enabled, without browsing deeper

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 17-09-18 22:58, Steven Watanabe via Boost wrote:
> - The passive voice is overused.

I spit my coffee there. Thanks


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
On Wed, 19 Sep 2018 at 13:22, Seth via Boost <[hidden email]> wrote:
> On 17-09-18 22:58, Steven Watanabe via Boost wrote:
> > - The passive voice is overused.
>
> I spit my coffee there. Thanks

Please, keep sich off-topic messages out of the review threads.

Thank you!

--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

[histogram] review part 1 (documentation)

Boost - Dev mailing list
On 19-09-18 13:34, Mateusz Loskot via Boost wrote:
> Please, keep sich off-topic messages out of the review threads.
>
> Thank you!

I'm sorry. I should have been more to the point. My actual gripe with
the message would have been that it presented immaterial feedback in an
almost unusable form (a PR would be some much more effective for textual
changes) and also seemed to be missing any thoughts on the merit of the
library. But, I admit I burnt all my time reading through that, and
perhaps the material feedback appeared elsewhere in the thread. For lack
of time, I'll just bow out


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

(off-list) Re: [histogram] review part 1 (documentation)

Boost - Dev mailing list
Hi Seth,

I appreciate your understanding.
I agree with your comment though.
Hopefully, Hans will request some more details.

Regards,
Mateusz


On Wed, 19 Sep 2018 at 14:12, Seth via Boost <[hidden email]> wrote:

>
> On 19-09-18 13:34, Mateusz Loskot via Boost wrote:
> > Please, keep sich off-topic messages out of the review threads.
> >
> > Thank you!
>
> I'm sorry. I should have been more to the point. My actual gripe with
> the message would have been that it presented immaterial feedback in an
> almost unusable form (a PR would be some much more effective for textual
> changes) and also seemed to be missing any thoughts on the merit of the
> library. But, I admit I burnt all my time reading through that, and
> perhaps the material feedback appeared elsewhere in the thread. For lack
> of time, I'll just bow out
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Here is my input. I hope it is helpful.

Kind regards, Alex
--


I like the histogram library because it offers a straightforward solution to a common problem. In my experience it has never been a complex problem (because of the size and complexity of my data), but still it is nice to have a clean solution, avoid duplication, and avoid pitfalls related to under/overflows, and open/closed intervals.

From that perspective, I would really like the library to be as intuitive and robust as possible.

Based on that I have a couple of questions / remarks

*Axis_type concept (should probably be AxisType)*

Why must axis_type derive from labelled_base and iterator_mixin?

My understanding is that concepts are ideally specified in terms of usage and behavior, rather than implementation. So for iterator_mixin, would it not be better to just require that the following works (with the usual semantics)

auto i = axis.begin();
auto i = axis.end();
auto i = axis.rbegin();
auto i = axis.rend();

You need to document what iterator concept you adhere to, e.g. I would promise/require OutputIterator with additional requirement of multipass.

Likewise for labelled base can you just require:

boost::string_view str = axis.label();
axis.label(str);
auto a = axis.get_allocator(); // I don't really see the point of this to be honest

and for base:

int a = size();
int b = shape();
int c = uoflow();

shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.

As a user implementing my own axis type, I would prefer just implementing those ten member functions over deriving from the two base classes. It would allow me to make an axis-type that is independent of boost / boost::histogram, and avoid complex inheritance. And, if I wanted, I could still derive from the base classes.

Further to those, you would just have

auto b = axis[i];// where b is of type axis::bin_type
auto i = axis.index(v); // where v is type axis::value_type

For these you need to document how the under- and overflow bins are numbered

For the AxisType concept definition to be complete, you must also define a BinType concept for the bin_type and a SortableValue concept (I suppose) for the value_type.

Why did you chose to make the label a part of the  axis? Of course there are many cases where we like the axis to have a label, but is that not true for many other classes, for instance std::vector, std::map, etc. Making this a default seems to be against the stated idea of "you don't pay for what you don't need". It seems to me that you require me to use a pair<axis, string> where I would be happy to use an axis.

Why is it necessary for axes to be comparable? And is is not asking for trouble to also compare the axis title (considering lower/upper case typos, etc.)

*Not documenting the histogram class*

I could not find the documentation/reference for the histogram class. I suppose this is on purpose because there are two implementations that have a common interface? In that case I would document the common interface as a concept too.

The user should then in my opinion also not use the histogram class directly, but strictly rely on your factory functions that create Histogram (the concept, not the class) objects.

In the tutorial there is the following part to create histogram by copying or moving axes from a std::vector of axes.

auto h1 = make_dynamic_histogram(v.begin(), v.end()); // copying axes
auto h2 = bh::histogram<decltype(v)>(std::move<v>); // moving axes

Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.

Perhaps this could be avoided by providing factory functions, that take a Range of axes instead of pairs of iterators. e.g.

template<typename AxisRange> make_dynamic_histogram_from_range(AxisRange&& axes);
template<typename Storage, typename AxisRange> make_dynamic_histogram_from_range(Storage&& storage, AxisRange&& axes);

as this is Perfect Forwarding, you can automatically decide whether to move or copy. Furthermore, the user is not limited to providing axes in a std::vector

*Not having an overflow bin for Circular axes*

How can this type of axes deal with NaN input?

*Fill Histogram*

The functional style example seems overly complicated and apparently (according to your comments) depends on smartness of the compiler, would the following not be preferred?

auto h2 = make_static_histogram(
  bh::axis::regular<>(8, 0, 4),
  bh::axis::regular<>(10, 0, 5));
for(auto&& v : input_data) h2(v);

I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").

*Access bin counts*

It seems unfortunate that the methods to access the indices of a bin are methods of the iterator rather than the value. This would preclude most uses of range-based for-loops with histograms. I see that to change this could be tricky, depending on the  iterator concept you wish to support. Perhaps it is an option to provide separate Ranges (Views) for histograms that iterate only over values; over values and variances, over indices and values and over indices, values and variances.

You could have something like this:

for(auto&& v : h)     // alternatively: for(auto&& v :  indices_and_values_view(h))
{
  std::cout << v.idx(0) << ' ' << v.idx(1) << ' ' << v.value() << std::endl;
}

Instead of:

for (auto it = h.begin(); it != h.end(); ++it)
{
  std::cout << it.idx(0) << ' ' << it.idx(1) << ' ' << it->value() << std::endl;
}

*Functionality*

The documentation says that automated bin boundaries are not an option, because there is no consensus on how to do this. While I understand you would not want to take this on, I would think a library that includes a number of methods to do this with the possibility to add your own would be very valuable and lack of consensus is no real argument. For instance, GIS packages do this to create legends for geographical maps, and I would like to have this capability at my fingertips.

It would be useful to be able to aggregate bins with an intuitive interface, e.g. if I have the population age distribution in 1-year intervals, aggregate it to 5-year intervals. This would be a generalization of the reduce_to function.

I like that you included the option to combine storages. Have you also considered the option to subtract storages / samples? This would make histogram more suitable for moving window type analysis.

*Conclusion*

I vote for inclusion in Boost because it offers fundamental and very useful functionality.

- What is your evaluation of the design?
-- More can be done to simplify concepts
-- More can be done to simplify the interface, especially to support use with range-based for-loops.
-- I think the AxisType concept would be better without the label

- What is your evaluation of the implementation?
-- Didn't look in detail. To me it seemed more complex than necessary, but perhaps that is where the good performance comes from.

- What is your evaluation of the documentation?
-- Good, could always be better. It is enough to get started. Some important parts are still missing

- What is your evaluation of the potential usefulness of the library?
-- I would use it

- Did you try to use the library? With what compiler?
-- No

-  Did you have any problems?
-- No

- How much effort did you put into your evaluation?
  A glance? A quick reading? In-depth study?
-- A careful reading, and checking in code where I did not understand docs
 
- Are you knowledgeable about the problem domain?
-- Not really

 *Spitting of coffee*

 The documentation is the first impression people will get from a library. It is incredibly generous of reviewers to pick through the language and give detailed general and specific advice. Avoiding the passive tense where possible is good and useful advice.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Formal Review of Proposed Boost.Histogram Library Starts TODAY

Boost - Dev mailing list
*** Please ignore the previous email, I started writing an answer, but then had to stop. Somehow that email was send accidentally.***

Dear Alex,

thank you very much for the review! Let me answer some comments.

> On 20. Sep 2018, at 16:51, a.hagen-zanker--- via Boost <[hidden email]> wrote:
> Based on that I have a couple of questions / remarks
>
> *Axis_type concept (should probably be AxisType)*

Ok (Steven also commented on this).

> Why must axis_type derive from labelled_base and iterator_mixin?

Good point, the label and the functionality provided by the iterator_mixin can be made optional.

https://github.com/HDembinski/histogram/issues/93

> My understanding is that concepts are ideally specified in terms of usage and behavior, rather than implementation. So for iterator_mixin, would it not be better to just require that the following works (with the usual semantics)

Ideally yes, "Although practicality beats purity." I will come back to this below.

> auto i = axis.begin();
> auto i = axis.end();
> auto i = axis.rbegin();
> auto i = axis.rend();
>
> You need to document what iterator concept you adhere to, e.g. I would promise/require OutputIterator with additional requirement of multipass.

I need to check, but I think these iterators can be made optional.

> Likewise for labelled base can you just require:
>
> boost::string_view str = axis.label();
> axis.label(str);
> auto a = axis.get_allocator(); // I don't really see the point of this to be honest

axis.get_allocator() can be optional. Some axis types need an allocator to get memory from the heap, other don't. I didn't want to store the allocator twice, so an axis that needs an allocator uses the label allocator.

> and for base:
>
> int a = size();
> int b = shape();
> int c = uoflow();
>
> shape() and uoflow() are not very intuitive names/functions to me, how about bool has_underflow() and bool has_overflow() to get the same information.

Do you have a suggestion for how to name "shape()" instead? I am not 100 % happy myself. extend() ? total_size() ?

I see the point with uoflow(). It should return the enum instead of an int. An implementation detail is leaking here.

> As a user implementing my own axis type, I would prefer just implementing those ten member functions over deriving from the two base classes. It would allow me to make an axis-type that is independent of boost / boost::histogram, and avoid complex inheritance. And, if I wanted, I could still derive from the base classes.

The only requirement I want to keep is that users have to derive from axis::base. I don't see how this limits anyone in their freedom to implement custom axis types, and it ensures that there is a standard layout in memory for the basic numbers that an axis has to provide to the histogram host. This enables optimisations that matter for the dynamic version of the histogram, where the axis types are stored in a variant-like type. If all axis types derive from axis::base, then I can write a visitor that static_casts to axis::base to query these numbers. We checked that gcc then removes the visitation code, because it repeatedly does the same thing. As compilers become better and better, this optimisation will also be applied by other compilers.

In short, for the sake of performance, it is mandatory that all axis types derive from axis::base, while deriving from iterator_mixin and axis::labeled_base should be optional.

> Further to those, you would just have
>
> auto b = axis[i];// where b is of type axis::bin_type
> auto i = axis.index(v); // where v is type axis::value_type
>
> For these you need to document how the under- and overflow bins are numbered

Agreed, this will become part of the specs.

> For the AxisType concept definition to be complete, you must also define a BinType concept for the bin_type and a SortableValue concept (I suppose) for the value_type.

Yes to BinType concept, while the SortableValue concept is only required for some axis types. The axis::category only requires the value_type to be equal-comparable and not sortable.

> Why did you chose to make the label a part of the  axis? Of course there are many cases where we like the axis to have a label, but is that not true for many other classes, for instance std::vector, std::map, etc.

I don't know what you mean by "std::vector, std::map". I agree with you that you need to be able to make a custom axis without a label if you wish.

> Making this a default seems to be against the stated idea of "you don't pay for what you don't need". It seems to me that you require me to use a pair<axis, string> where I would be happy to use an axis.

All the internal axis types have a builtin label, which is a design decision based on the experience that labels are generally useful to a large audience. Once you can write your own axis types that do not use labels, I think this point is settled, right?

> Why is it necessary for axes to be comparable? And is is not asking for trouble to also compare the axis title (considering lower/upper case typos, etc.)

Histograms should be addable, but you should only be able to add histograms with the same logical layout. It makes no sense to add two histograms which have the same number of bins, but different axis types. Or further, which have the same axis types, but different axis labels. The label is a way to make a logical distinction between two otherwise identical axis types, therefore it is included in the comparison. How exactly do you foresee trouble?

> *Not documenting the histogram class*
>
> I could not find the documentation/reference for the histogram class. I suppose this is on purpose because there are two implementations that have a common interface? In that case I would document the common interface as a concept too.

The histogram template is not included in the reference, but that is a mistake. It should be there, it is not an implementation detail (reference generator puts *unspecified*, but that's wrong). Perhaps someone with more experience with the doxygen scripts in Boost can help?

I merged the two implementations into one templated class a while ago, which is much prettier, and updated the docs. Since you found an obsolete reference to two implementations, could you please point me to where you read that?

> In the tutorial there is the following part to create histogram by copying or moving axes from a std::vector of axes.
>
> auto h1 = make_dynamic_histogram(v.begin(), v.end()); // copying axes
> auto h2 = bh::histogram<decltype(v)>(std::move<v>); // moving axes
>
> Now, h1 seems very reasonable to me, but for h2 you now need to use the undocumented histogram class. It is not clear to me what the two template arguments are for in histogram<A,S>. It strikes me as strange that A should be a vector<axis::any_std>, because having the axes stored in a vector seems to be an implementation detail irrelevant to the user.

It is not an implementation detail. Histogram supports two ways of storing axis types, via std::tuple or via std::vector<axis::any<…>>. These two options are part of the contract. Anyway, these line of thought is based on the assumption that the histogram itself is unspecified, which is not the case. Sorry about the error in the reference.

> Perhaps this could be avoided by providing factory functions, that take a Range of axes instead of pairs of iterators. e.g.
>
> template<typename AxisRange> make_dynamic_histogram_from_range(AxisRange&& axes);
> template<typename Storage, typename AxisRange> make_dynamic_histogram_from_range(Storage&& storage, AxisRange&& axes);

Optional support for ranges or std::span can be added at a later point, this is just syntactic sugar.
https://github.com/HDembinski/histogram/issues/94

> as this is Perfect Forwarding, you can automatically decide whether to move or copy. Furthermore, the user is not limited to providing axes in a std::vector

I see no need for the user to use anything other std::vector or std::tuple to store axis types. The implementation could be made more flexible in that regard, but there is simply nothing to gain by doing so, only an increase in code complexity.

> *Not having an overflow bin for Circular axes*
>
> How can this type of axes deal with NaN input?

Good point! Passing NaN to a circular axis is currently UB. There are two solutions.

1) Allow an optional overflow bin to count NaNs (off by default)
2) Fail an assert when NaN is passed

I prefer option 1.
https://github.com/HDembinski/histogram/issues/95

> *Fill Histogram*
>
> The functional style example seems overly complicated and apparently (according to your comments) depends on smartness of the compiler, would the following not be preferred?
>
> auto h2 = make_static_histogram(
>  bh::axis::regular<>(8, 0, 4),
>  bh::axis::regular<>(10, 0, 5));
> for(auto&& v : input_data) h2(v);
>
> I know this is not strictly functional, but you could easily provide another factory function to do this (i.e. something like "make_and_fill_histogram").

I am not sure what you mean, but I think you want me to change the example? I would not use functional style myself for the same reasons you mention, but some people requested this and so I put an example in the guide to show it is possible and discuss the caveats. I think this is a good solution to the documentation problem.

> *Access bin counts*
>
> It seems unfortunate that the methods to access the indices of a bin are methods of the iterator rather than the value. This would preclude most uses of range-based for-loops with histograms. I see that to change this could be tricky, depending on the  iterator concept you wish to support. Perhaps it is an option to provide separate Ranges (Views) for histograms that iterate only over values; over values and variances, over indices and values and over indices, values and variances.
>
> You could have something like this:
>
> for(auto&& v : h)     // alternatively: for(auto&& v :  indices_and_values_view(h))
> {
>  std::cout << v.idx(0) << ' ' << v.idx(1) << ' ' << v.value() << std::endl;
> }
>
> Instead of:
>
> for (auto it = h.begin(); it != h.end(); ++it)
> {
>  std::cout << it.idx(0) << ' ' << it.idx(1) << ' ' << it->value() << std::endl;
> }

I also like range-based for-loops, and I considered this solution, but then decided against it. Providing these views adds complexity to the interface and the implementation just to enable a bit of syntactic sugar. The current iterator provides all extra information you may want to query when you iterate over a multi-dimensional histogram, but also behaves like a primitive iterator over the counter values, so that simple algorithms like std::accumulate work. STL algorithms do not work anyway with this data structure when you have to do something more complex with the content, because they were not designed to handle multi-dimensional data.

But somehow I guess users will not be happy unless they can use range-based for-loops, so I will reconsider.
https://github.com/HDembinski/histogram/issues/96

> *Functionality*
>
> The documentation says that automated bin boundaries are not an option, because there is no consensus on how to do this. While I understand you would not want to take this on, I would think a library that includes a number of methods to do this with the possibility to add your own would be very valuable and lack of consensus is no real argument. For instance, GIS packages do this to create legends for geographical maps, and I would like to have this capability at my fingertips.

Any form of automated bin boundary computation that requires multi-pass on the data will not be handled by the library.

I am currently considering simple forms of automated bin boundary computation that can be implemented in single-pass, like automatically adding bins to a regular axis when a value falls outside the current range.

> It would be useful to be able to aggregate bins with an intuitive interface, e.g. if I have the population age distribution in 1-year intervals, aggregate it to 5-year intervals. This would be a generalization of the reduce_to function.

I plan to add a support library with common algorithms. "reduce_to" will become a free function and move there, and an option to aggregate bins like you suggest will also be added, called "rebin". Perhaps these two can/should be combined.

https://github.com/HDembinski/histogram/issues/98

> I like that you included the option to combine storages. Have you also considered the option to subtract storages / samples? This would make histogram more suitable for moving window type analysis.

Subtracting histograms is currently not implemented, because I originally wanted to maintain that bin entries are always non-negative. But it should be optionally available if the storage supports negative entries.

https://github.com/HDembinski/histogram/issues/99

> *Conclusion*
>
> I vote for inclusion in Boost because it offers fundamental and very useful functionality.

Thanks!

> - What is your evaluation of the implementation?
> -- Didn't look in detail. To me it seemed more complex than necessary, but perhaps that is where the good performance comes from.

If you have specific suggestions on how to simplify things, let me know. In general, the more use and corner cases the library has to support, the more complex its implementation gets.

Best regards,
Hans

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
123