[stl_ext_adv] ] Interest in advanced data structures

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
Hi all,

Advanced data structures (ADS) offer useful STL extensions that are not
possible or not efficient with basic data structures, such as arrays,
linked lists and red-black trees.

The project

https://github.com/vstadnik/stl_ext_adv

http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/index.html

focuses mainly on a generalized and unified computer representation of
sequences using the augmented B+ trees. It provides the following STL
extensions:


- The sequence container supports random access iterators and a union of
interfaces of STL containers: vector, list and deque. This sequence offers
the advantage of efficient update operations with logarithmic running time
compared with linear running time of the same operations of std::vector.

- The associative containers (set, multiset, map and multimap) with random
access iterators.

- The enhanced random access iterators effectively address the fundamental
problem of the iterator invalidation.

- The function accumulate() with logarithmic computational complexity can
significantly improve performance of many practically important algorithms
and applications, such as the numerical integration, the statistics, the
data analysis etc.


The examples demonstrate how to improve the computational complexity from
O(N) to O(log N) in the following applications and problems:

- numerical integration.
- calculation of mean value and standard deviation;
- calculation of weighted mean value;

- testing if any number of consecutive elements are equal;

- counting intersections of one interval with sequence intervals;


Is there an interest in this project and implemented STL extensions?


Interest in potentially useful, but not yet implemented algorithms and
specialized containers:


1. The function accumulate() with constant computational cost. With this
function some user algorithms, such as the numerical integration, can come
close to the absolute theoretical limit of the computational cost. The
function can be implemented through iterators enhanced with functionality
of accumulators. For more details, see the section “Random access
iterators” in the documentation.


2. The functions sequence::splice() with logarithmic running time in the
worst case. Note that functions std::list::splice() provide in many cases
constant computational cost, however, in the worst case the cost is linear.


3. Any other potentially useful specialized containers, iterators or
algorithms?

Augmented B+ trees are versatile and powerful data structures. They can be
designed to maximize efficiency of particular algorithms and programming
solutions. The limitation of specialized containers is that the
computational cost of some other operations may increase.

About augmented red black trees.

The augmented red black trees can also support various STL extensions. The
red black trees are yet out of the scope of this project for the reasons
explained in the documentation. However, historically augmented red-black
trees are probably the first ADS that could extend STL, since the method of
augmenting red black trees was described as early as in 1990 [T.H. Cormen,
C.E. Leiserson and R.L. Rivest.” Introduction to Algorithms” (1st ed.). MIT
Press and McGraw-Hill, 1990]. This means, in particular, that the very
first version of STL could provide associative containers with random
access iterators. Also, as I understand the previously submitted project
https://github.com/boost-vault/Containers/blob/master/countertree.zipimplements
containers based on the augmented red black trees.



It is hardly possible to make an exact prediction, but without a doubt
future versions of STL will benefit from advanced data structures.

Regards,
Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Thijs (M.A.) van den Berg
> Hi all,
>
> Advanced data structures (ADS) offer useful STL extensions that are not
> possible or not efficient with basic data structures, such as arrays,
> linked lists and red-black trees.
>
> The project
>
> https://github.com/vstadnik/stl_ext_adv
>
> http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/index.html
>
> focuses mainly on a generalized and unified computer representation of
> sequences using the augmented B+ trees. It provides the following STL
> extensions
..
>
> The examples demonstrate how to improve the computational complexity from
> O(N) to O(log N) in the following applications and problems:
>
> - numerical integration.
> - calculation of mean value and standard deviation;
> - calculation of weighted mean value;

I think O(log N) is misleading.
Problem:  I  want to estimate the mean of N numbers in a file on my disk.
Clearly I should al least present all of the number in that file al least once  (O(N)) to a black box algorithm that computes the mean?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
On Wed, Nov 30, 2011 at 9:06 PM, Thijs (M.A.) van den Berg
<[hidden email]>wrote:

> > The examples demonstrate how to improve the computational complexity from
> > O(N) to O(log N) in the following applications and problems:
> >
> > - numerical integration.
> > - calculation of mean value and standard deviation;
> > - calculation of weighted mean value;
>
> I think O(log N) is misleading.
> Problem:  I  want to estimate the mean of N numbers in a file on my disk.
> Clearly I should al least present all of the number in that file al least
> once  (O(N)) to a black box algorithm that computes the mean?
>
> Thank you for this comment.
You are right, if you mean the cost of construction of an advanced data
structure, it is O(N log N). As soon as the required data have been written
the sum and mean value of any consecutive elements can be found in O(log N)
time. Please, have a look at performance tests of the fast algorithm
accumulate() in the section:


http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/performance.html


Also, you can use example or test code to notice the difference in the
performance of standard and fast algorithms accumulate().


Regards,

Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Thijs (M.A.) van den Berg
> On Wed, Nov 30, 2011 at 9:06 PM, Thijs (M.A.) van den Berg
> <[hidden email]>wrote:
>
>>> The examples demonstrate how to improve the computational complexity from
>>> O(N) to O(log N) in the following applications and problems:
>>>
>>> - numerical integration.
>>> - calculation of mean value and standard deviation;
>>> - calculation of weighted mean value;
>>
>> I think O(log N) is misleading.
>> Problem:  I  want to estimate the mean of N numbers in a file on my disk.
>> Clearly I should al least present all of the number in that file al least
>> once  (O(N)) to a black box algorithm that computes the mean?
>>
>> Thank you for this comment.
> You are right, if you mean the cost of construction of an advanced data
> structure, it is O(N log N). As soon as the required data have been written
> the sum and mean value of any consecutive elements can be found in O(log N)
> time. Please, have a look at performance tests of the fast algorithm
> accumulate() in the section:
>
>
> http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/performance.html
>
>
> Also, you can use example or test code to notice the difference in the
> performance of standard and fast algorithms accumulate().
>
Thank Vadim. For certain applications your container clearly outperforms other. The hierarchical accumulation is particularly interesting if you need to calculate many more accumulations than you need to do inserts.
 To get a better view of both pros and potential cons, can you also add a comparison for *push_back inserts* (like you did in the multiset) in a std::vector besides the worst case 'before the middle'?

An interesting project!


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vicente Botet
In reply to this post by Vadim Stadnik
Le 30/11/11 10:58, Vadim Stadnik a écrit :

> Hi all,
>
> Advanced data structures (ADS) offer useful STL extensions that are not
> possible or not efficient with basic data structures, such as arrays,
> linked lists and red-black trees.
>
> The project
>
> https://github.com/vstadnik/stl_ext_adv
>
> http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/index.html
>
> focuses mainly on a generalized and unified computer representation of
> sequences using the augmented B+ trees. It provides the following STL
> extensions:
>
>
> - The sequence container supports random access iterators and a union of
> interfaces of STL containers: vector, list and deque. This sequence offers
> the advantage of efficient update operations with logarithmic running time
> compared with linear running time of the same operations of std::vector.
>
> - The associative containers (set, multiset, map and multimap) with random
> access iterators.
>
> - The enhanced random access iterators effectively address the fundamental
> problem of the iterator invalidation.
>
> - The function accumulate() with logarithmic computational complexity can
> significantly improve performance of many practically important algorithms
> and applications, such as the numerical integration, the statistics, the
> data analysis etc.
>
>
Hi,

how is your library related/compared to Boost.MultiIndex?

Best,
Vicente


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
In reply to this post by Thijs (M.A.) van den Berg
On Wed, Nov 30, 2011 at 10:17 PM, Thijs (M.A.) van den Berg <[hidden email]
> wrote:

> > On Wed, Nov 30, 2011 at 9:06 PM, Thijs (M.A.) van den Berg
> > <[hidden email]>wrote:
>


> Thank Vadim. For certain applications your container clearly outperforms
> other. The hierarchical accumulation is particularly interesting if you
> need to calculate many more accumulations than you need to do inserts.
>  To get a better view of both pros and potential cons, can you also add a
> comparison for *push_back inserts* (like you did in the multiset) in a
> std::vector besides the worst case 'before the middle'?
>
> An interesting project!
>
> Thank you again for useful comments.
I will add the comparison of push_back() performance in standard sequences
(std::list and std::vector) and new sequences that are supported by two
types of augmented B+ trees in this project.
Regards,
Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
In reply to this post by Vicente Botet
On Wed, Nov 30, 2011 at 11:29 PM, Vicente J. Botet Escriba <
[hidden email]> wrote:

>
>> how is your library related/compared to Boost.MultiIndex?
>
>
As far as I understand library multi-index provides multiple views of a
collection of data. The submitted containers and data structures can be
quite useful at implementation level for this type of applications, since
one advanced data structure efficiently supports many types of STL
sequences and associative containers.

As for applications to other Boost libraries, I think that in the current
shape the new data structures and containers can increase the performance
of Accumulators and Interval. The project includes examples demonstrating
how these containers can solve typical problems in these areas. With some
modifications of the augmented data structures may be used in other Boost
libraries.

Regards,
Vadim

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vicente Botet
Vadim Stadnik wrote
On Wed, Nov 30, 2011 at 11:29 PM, Vicente J. Botet Escriba <
[hidden email]> wrote:

>
>> how is your library related/compared to Boost.MultiIndex?
>
>
As far as I understand library multi-index provides multiple views of a
collection of data. The submitted containers and data structures can be
quite useful at implementation level for this type of applications, since
one advanced data structure efficiently supports many types of STL
sequences and associative containers.
I don't see how your library could improve the efficiency of Boost.MultiIndex. Maybe you could show some figures comparing both libraries.

As for applications to other Boost libraries, I think that in the current
shape the new data structures and containers can increase the performance
of Accumulators and Interval.
How your library can improve the performance of these libraries?

Note that I'm not against your library, that I don't know well, but just I want to see what is the added value.

Best,
Vicente
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
On Thu, Dec 1, 2011 at 3:01 AM, Vicente Botet <[hidden email]>wrote:

>
> >>> how is your library related/compared to Boost.MultiIndex?
> >>
> > As far as I understand library multi-index provides multiple views of a
> > collection of data. The submitted containers and data structures can be
> > quite useful at implementation level for this type of applications, since
> > one advanced data structure efficiently supports many types of STL
> > sequences and associative containers.
> >
>
> I don't see how your library could improve the efficiency of
> Boost.MultiIndex. Maybe you could show some figures comparing both
> libraries.
>
> > As for applications to other Boost libraries, I think that in the current
> > shape the new data structures and containers can increase the performance
> > of Accumulators and Interval.
> >
>
> How your library can improve the performance of these libraries?
> Note that I'm not against your library, that I don't know well, but just I
> want to see what is the added value.
>
>
I am not familiar with implementation details of multi-index library. My
reply only concerns the general use of new containers in multi-view
applications. The demonstration  is actually available in the example of
fast counting of intersections of one interval with sequence of intervals.
This solution uses one container, which stores all objects of intervals,
plus two containers, which represent ordered views of lower and upper
endpoints of these intervals. The ordered views support efficient search
operations and provide random access iterators. For this reason the
algorithm of intersection counting has logarithmic running time against
linear time if the solution was based on standard containers. In principle,
the approach can be extended and applied to range queries using multiple
views.

New library can support efficient accumulators through the fast member
function accumulate() with logarithmic running time in the worst case.
Efficient accumulators are not yet implemented in this project. The section
"Iterators and accumulators" of this document

http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/iterators.html

discusses pros and cons of this option. I am not sure whether to add this
functional or not, this is why I ask Boost community if there is an
interest in this functionality.

Regards,
Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vicente Botet
Le 01/12/11 00:41, Vadim Stadnik a écrit :

> On Thu, Dec 1, 2011 at 3:01 AM, Vicente Botet<[hidden email]>wrote:
>
>>>>> how is your library related/compared to Boost.MultiIndex?
>>> As far as I understand library multi-index provides multiple views of a
>>> collection of data. The submitted containers and data structures can be
>>> quite useful at implementation level for this type of applications, since
>>> one advanced data structure efficiently supports many types of STL
>>> sequences and associative containers.
>>>
>> I don't see how your library could improve the efficiency of
>> Boost.MultiIndex. Maybe you could show some figures comparing both
>> libraries.
>>
>>> As for applications to other Boost libraries, I think that in the current
>>> shape the new data structures and containers can increase the performance
>>> of Accumulators and Interval.
>>>
>> How your library can improve the performance of these libraries?
>> Note that I'm not against your library, that I don't know well, but just I
>> want to see what is the added value.
>>
>>
> I am not familiar with implementation details of multi-index library. My
> reply only concerns the general use of new containers in multi-view
> applications.

What do you mean on the last sentence?

> The demonstration  is actually available in the example of
> fast counting of intersections of one interval with sequence of intervals.
> This solution uses one container, which stores all objects of intervals,
> plus two containers, which represent ordered views of lower and upper
> endpoints of these intervals. The ordered views support efficient search
> operations and provide random access iterators. For this reason the
> algorithm of intersection counting has logarithmic running time against
> linear time if the solution was based on standard containers. In principle,
> the approach can be extended and applied to range queries using multiple
> views.
What I mean is that it seems to me that Boost.MultiIndex provides
already several index for a container, including random access index.
So, what is exactly the added value.
If your use case can be implemented already using Boost.MultiIndex I
will consider to propose a specific abstraction on top of
Boost.Container as it is done in Boost.Bimap. If it is not possible,
because a specific kind of container must be added to Boost.MultiIndex,
I will prefer Boost.Multiindex integrates this new container if the use
case is  of general purpose.

I guess that you could study Boost.MultiIndex to see if your library
provides something that is not doable with multi-index, or can be done
better or that improves the performances. I ensure you will not lost
your time.
> New library can support efficient accumulators through the fast member
> function accumulate() with logarithmic running time in the worst case.
Doesn't accumulate has O(log N) complexity for containers that have
random access? If not, maybe Boost could include a boost::accumulate
function that provides this complexity (if not already exists) in a
non-intrusive way.
> Efficient accumulators are not yet implemented in this project. The section
> "Iterators and accumulators" of this document
>
> http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/iterators.html
>
> discusses pros and cons of this option. I am not sure whether to add this
> functional or not, this is why I ask Boost community if there is an
> interest in this functionality.
IIUC you are proposing to store the accumulation of all the lower nodes
in each node of the container, and maintain this information up to date
while inserting and removing elements, isn't it?

I have not found a use case for that, but several times I have needed to
have each node store the accumulation of each one of its children.

I will see this as a possible addition to  Boost.MultiIndex as a special
container/index that stores the accumulation of the preceding members,
with the pros and cons. Whether it is worth including this in
Boost.MultiIndex is an open question, that Joaguin could answer if needed.

Best,
Vicente

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
On Thu, Dec 1, 2011 at 11:26 AM, Vicente J. Botet Escriba <
[hidden email]> wrote:

>
>>  how is your library related/compared to Boost.MultiIndex?
>>>>>>
>>>>> As far as I understand library multi-index provides multiple views of a
>>>> collection of data. The submitted containers and data structures can be
>>>> quite useful at implementation level for this type of applications,
>>>> since
>>>> one advanced data structure efficiently supports many types of STL
>>>> sequences and associative containers.
>>>>
>>>>  I don't see how your library could improve the efficiency of
>>> Boost.MultiIndex. Maybe you could show some figures comparing both
>>> libraries.
>>>
>>>  As for applications to other Boost libraries, I think that in the
>>>> current
>>>> shape the new data structures and containers can increase the
>>>> performance
>>>> of Accumulators and Interval.
>>>>
>>>>  How your library can improve the performance of these libraries?
>>> Note that I'm not against your library, that I don't know well, but just
>>> I
>>> want to see what is the added value.
>>>
>>>  I am not familiar with implementation details of multi-index library. My
>> reply only concerns the general use of new containers in multi-view
>> applications.
>>
>
> What do you mean on the last sentence?
>
>  The demonstration  is actually available in the example of
>> fast counting of intersections of one interval with sequence of intervals.
>> This solution uses one container, which stores all objects of intervals,
>> plus two containers, which represent ordered views of lower and upper
>> endpoints of these intervals. The ordered views support efficient search
>> operations and provide random access iterators. For this reason the
>> algorithm of intersection counting has logarithmic running time against
>> linear time if the solution was based on standard containers. In
>> principle,
>> the approach can be extended and applied to range queries using multiple
>> views.
>>
> What I mean is that it seems to me that Boost.MultiIndex provides already
> several index for a container, including random access index. So, what is
> exactly the added value.
> If your use case can be implemented already using Boost.MultiIndex I will
> consider to propose a specific abstraction on top of Boost.Container as it
> is done in Boost.Bimap. If it is not possible, because a specific kind of
> container must be added to Boost.MultiIndex, I will prefer Boost.Multiindex
> integrates this new container if the use case is  of general purpose.
>

The last sentence is very close to what I mean. The proposed library is a
generalized and quite efficient extension of standard containers as
described in the documentation. The containers and algorithms of this
library are designed to become modules and provides support for other more
advanced libraries and solutions.


> I guess that you could study Boost.MultiIndex to see if your library
> provides something that is not doable with multi-index, or can be done
> better or that improves the performances. I ensure you will not lost your
> time.



I will have a deeper look at multi-index library. Also, I would like to ask
the following question, which can clarify the usefulness of new containers
for multi-index:

Can multi-index library solve the problem of counting of intersections of
one interval with an arbitrary sequence of intervals in O(log N) time and
provide the same running time for update operations?
The proposed library can support this efficiency, since its containers have
logarithmic cost of find, insert and erase operations and random access
iterator for both ordered and un-ordered data.


>
>  New library can support efficient accumulators through the fast member
>> function accumulate() with logarithmic running time in the worst case.
>>
> Doesn't accumulate has O(log N) complexity for containers that have random
> access? If not, maybe Boost could include a boost::accumulate function that
> provides this complexity (if not already exists) in a non-intrusive way.


Standard containers, such as std::vector, std::list, std::set etc., provide
linear computational cost of std::accumulate() through iterators. If Boost
library has already implemented algorithm accumulate() with logarithmic
complexity, please send me a reference or a link, this is very interesting.


>
>  Efficient accumulators are not yet implemented in this project. The
>> section
>> "Iterators and accumulators" of this document
>>
>> http://dl.dropbox.com/u/**51041088/stl_ext_adv_doc/doc/**iterators.html<http://dl.dropbox.com/u/51041088/stl_ext_adv_doc/doc/iterators.html>
>>
>> discusses pros and cons of this option. I am not sure whether to add this
>> functional or not, this is why I ask Boost community if there is an
>> interest in this functionality.
>>
> IIUC you are proposing to store the accumulation of all the lower nodes in
> each node of the container, and maintain this information up to date while
> inserting and removing elements, isn't it?
>
Just a minor clarification: the accumulated value of all elements in the
range [0 , position) to be stored as a data member of an iterator.


> I have not found a use case for that, but several times I have needed to
> have each node store the accumulation of each one of its children.
>
The augmented B+ tree (class bp_tree_idx_acc) constructs these data for
every internal node or in other words for every level of the tree except
the lowest, which stores container elements.


> I will see this as a possible addition to  Boost.MultiIndex as a special
> container/index that stores the accumulation of the preceding members, with
> the pros and cons. Whether it is worth including this in Boost.MultiIndex
> is an open question, that Joaguin could answer if needed.
>


Thank you, that would be very helpful.


Regards,
Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

thorsten.ottosen
In reply to this post by Vadim Stadnik
Den 30-11-2011 10:58, Vadim Stadnik skrev:
> Hi all,
>
> Advanced data structures (ADS) offer useful STL extensions that are not
> possible or not efficient with basic data structures, such as arrays,
> linked lists and red-black trees.

I'm interested. In particular, if the library can be integrated
seamlessly with Boost.Accumulators and other libs, it seems very
interesting.

Are there variations of the trees that allow for other types of fast
queries than just accumulate? If so, then there might be a generic way
to incorporate them.

-Thorsten

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Andrii Sydorchuk
>
> You are right, if you mean the cost of construction of an advanced data
> structure, it is O(N log N). As soon as the required data have been written
> the sum and mean value of any consecutive elements can be found in O(log N)
> time. Please, have a look at performance tests of the fast algorithm
> accumulate() in the section:

The next solution works in O(1) and requires only O(N) for construction
step:
template <typename T>
class accumulator {
public:
  void init(std::vector<T> &elems) {
     sum.reserve(elems.size() + 1);
     sum.push_back(0);
     for (std::vector<T>::iterator it = elems.begin(); it != elems.end();
++it) {
       sum.push_back(sum.back() + *it);
     }
  }

  T operator()(T first, T last) const {
    return sum[last+1] - sum[first];
  }

private:
  std::vector<T> sum;
};
I guess this solution is slightly better for those math targets you
mentioned above. It also works with any associative operation.

Andrii

On Thu, Dec 1, 2011 at 11:42 AM, Thorsten Ottosen <
[hidden email]> wrote:

> Den 30-11-2011 10:58, Vadim Stadnik skrev:
>
>  Hi all,
>>
>> Advanced data structures (ADS) offer useful STL extensions that are not
>> possible or not efficient with basic data structures, such as arrays,
>> linked lists and red-black trees.
>>
>
> I'm interested. In particular, if the library can be integrated seamlessly
> with Boost.Accumulators and other libs, it seems very interesting.
>
> Are there variations of the trees that allow for other types of fast
> queries than just accumulate? If so, then there might be a generic way to
> incorporate them.
>
> -Thorsten
>
>
> ______________________________**_________________
> Unsubscribe & other changes: http://lists.boost.org/**
> mailman/listinfo.cgi/boost<http://lists.boost.org/mailman/listinfo.cgi/boost>
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
In reply to this post by thorsten.ottosen
On Thu, Dec 1, 2011 at 9:42 PM, Thorsten Ottosen <
[hidden email]> wrote:

> Den 30-11-2011 10:58, Vadim Stadnik skrev:
>
>  Hi all,
>>
>> Advanced data structures (ADS) offer useful STL extensions that are not
>> possible or not efficient with basic data structures, such as arrays,
>> linked lists and red-black trees.
>>
>
> I'm interested. In particular, if the library can be integrated seamlessly
> with Boost.Accumulators and other libs, it seems very interesting.
>
Thank you for the interest.

At the moment I think that the best option is to provide access to the
proposed data structures through STL interfaces, since they maximizes
usefulness of these data structures. The other options are the development
of specialized data structures or adding specialized interface functions to
the existing classes. Let me know what kind of functionality is not
supported yet to have a better understanding of the specific problems.



Are there variations of the trees that allow for other types of fast
> queries than just accumulate? If so, then there might be a generic way to
> incorporate them.

Yes, you are right.
The augmented B+ trees are widely used in solutions to various problems of
computational geometry. However, I think that this area is not general
enough for Boost libraries, this is why it is out of scope of this project.

In principle there are many other solutions which can benefit from
augmented data structures. I have a number of ideas as for future
development, but I am not sure if they are really useful for Boost users
and developers. This is why I asked Boost community about most interesting
and useful applications for the augmented data structures.


A generalized implementation of augmented data structures is possible. It
is also possible that such implementation can be based on a simpler
topology and data scheme than those used in the proposed library. However,
as usual a generalization involves some risk of not optimized support for
specific algorithms. For example, the proposed data structures support not
only the high efficiency of algorithm accumulate(), but also a relatively
high accuracy of this algorithm.



Regards,

Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Ion Gaztañaga
El 01/12/2011 13:30, Vadim Stadnik escribió:

> In principle there are many other solutions which can benefit from
> augmented data structures. I have a number of ideas as for future
> development, but I am not sure if they are really useful for Boost users
> and developers. This is why I asked Boost community about most interesting
> and useful applications for the augmented data structures.
>
>
> A generalized implementation of augmented data structures is possible. It
> is also possible that such implementation can be based on a simpler
> topology and data scheme than those used in the proposed library. However,
> as usual a generalization involves some risk of not optimized support for
> specific algorithms. For example, the proposed data structures support not
> only the high efficiency of algorithm accumulate(), but also a relatively
> high accuracy of this algorithm.

It would be very interesting if augmented data structures could be added
to Boost.Intrusive algorithms. This way could build intrusive structures
built on augmented data structures, and on top of that, non-intrusive,
STL-like containers.

If you are interested, have a look at Boost.Intrusive implementation to
see if we could introduce augmented data structures there and benefit
more applications.

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Joaquin M LópezMuñoz
In reply to this post by Vicente Botet
El 01/12/2011, a las 01:28, "Vicente J. Botet Escriba" <[hidden email]> escribió:

>
> I will see this as a possible addition to  Boost.MultiIndex as a special
> container/index that stores the accumulation of the preceding members,
> with the pros and cons. Whether it is worth including this in
> Boost.MultiIndex is an open question, that Joaguin could answer if needed.

Hi,

If time permits, I'll take a close look at the lib this weekend and offer my views on how it potentially interacts with Boost.Multiindex.

Joaquín M López Muñoz
Telefónica Digital

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo.
This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at.
http://www.tid.es/ES/PAGINAS/disclaimer.aspx

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
In reply to this post by Andrii Sydorchuk
On Thu, Dec 1, 2011 at 10:45 PM, Andrii Sydorchuk <
[hidden email]> wrote:

> >
> > You are right, if you mean the cost of construction of an advanced data
> > structure, it is O(N log N). As soon as the required data have been
> written
> > the sum and mean value of any consecutive elements can be found in O(log
> N)
> > time. Please, have a look at performance tests of the fast algorithm
> > accumulate() in the section:
>
> The next solution works in O(1) and requires only O(N) for construction
> step:
> template <typename T>
> class accumulator {
> public:
>  void init(std::vector<T> &elems) {
>     sum.reserve(elems.size() + 1);
>     sum.push_back(0);
>     for (std::vector<T>::iterator it = elems.begin(); it != elems.end();
> ++it) {
>       sum.push_back(sum.back() + *it);
>     }
>  }
>
>  T operator()(T first, T last) const {
>    return sum[last+1] - sum[first];
>  }
>
> private:
>  std::vector<T> sum;
> };
> I guess this solution is slightly better for those math targets you
> mentioned above. It also works with any associative operation.
>
> Andrii


This is the best possible approach provided that data are not modified or
updated. Unfortunately, in practical data analysis, data fitting and
approximation this is a relatively rare case. For large and huge data sets
the cost of vector update operations will be quite significant and often
prohibitive.

It is also interesting to note that if the same solution was based on
container sequence<bp_tree_idx> from the proposed library then it would
efficiently support update operations for container elements. However, it
would have linear cost of updating the sum data. The best possible
performance for a user algorithm which requires a wide set of operations
can be achieved with sequence<bp_tree_idx_acc>. The constant cost of
summation similar to this example is possible, but not implemented yet.
This is discussed in section "Iterators and accumulators".

Thank you,
Vadim

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
In reply to this post by Ion Gaztañaga
2011/12/1 Ion Gaztañaga <[hidden email]>

>
> It would be very interesting if augmented data structures could be added
> to Boost.Intrusive algorithms. This way could build intrusive structures
> built on augmented data structures, and on top of that, non-intrusive,
> STL-like containers.
>
> If you are interested, have a look at Boost.Intrusive implementation to
> see if we could introduce augmented data structures there and benefit more
> applications.
>
> Thank you for the interest and suggestions.
I am planning to analyze some Boost libraries from the user point of view
and compare with functionality of the proposed library. If Boost developers
find this new library interesting and useful I will be happy to discuss
implementation details and problems to be solved.

Regards,
Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Jeff Flinn-2
In reply to this post by Vadim Stadnik
Vadim Stadnik wrote:
> On Thu, Dec 1, 2011 at 11:26 AM, Vicente J. Botet Escriba <
> [hidden email]> wrote:
...
>>>>  As for applications to other Boost libraries, I think that in the
>>>>> current
>>>>> shape the new data structures and containers can increase the
>>>>> performance
>>>>> of Accumulators and Interval.
...

>>  The demonstration  is actually available in the example of
>>> fast counting of intersections of one interval with sequence of intervals.
>>> This solution uses one container, which stores all objects of intervals,
>>> plus two containers, which represent ordered views of lower and upper
>>> endpoints of these intervals. The ordered views support efficient search
>>> operations and provide random access iterators. For this reason the
>>> algorithm of intersection counting has logarithmic running time against
>>> linear time if the solution was based on standard containers. In
>>> principle,
>>> the approach can be extended and applied to range queries using multiple
>>> views.
...
> Can multi-index library solve the problem of counting of intersections of
> one interval with an arbitrary sequence of intervals in O(log N) time and
> provide the same running time for update operations?
> The proposed library can support this efficiency, since its containers have
> logarithmic cost of find, insert and erase operations and random access
> iterator for both ordered and un-ordered data.

Your example at least looks like some overlap with Boost ICL, the
Interval Container Library. Or is that what you were referring to above
when you mentioned the Boost Interval library?

Jeff



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [stl_ext_adv] ] Interest in advanced data structures

Vadim Stadnik
In reply to this post by Joaquin M LópezMuñoz
On Thu, Dec 1, 2011 at 11:49 PM, JOAQUIN M. LOPEZ MUÑOZ <[hidden email]>wrote:

> El 01/12/2011, a las 01:28, "Vicente J. Botet Escriba" <
> [hidden email]> escribió:
>
> >
> > I will see this as a possible addition to  Boost.MultiIndex as a special
> > container/index that stores the accumulation of the preceding members,
> > with the pros and cons. Whether it is worth including this in
> > Boost.MultiIndex is an open question, that Joaguin could answer if
> needed.
>
> Hi,
>
> If time permits, I'll take a close look at the lib this weekend and offer
> my views on how it potentially interacts with Boost.Multiindex.
>

Thank you, this should be very helpful.

Also, I would like to ask you to have a look at the example of fast
counting of intersections of one interval with sequence of intervals. It
has been already discussed in one of my previous replies to Vincente. I
guess multi-index can provide a solution to the problem, but what would be
the best efficiency of such solution with current multi-index? and what
would be the efficiency of update operations?

Regards,
Vadim Stadnik

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12