[Boost Graph Library] Possible succint data structure addition

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Boost Graph Library] Possible succint data structure addition

Boost - Dev mailing list
Dear Boost community,

I am a Master's student and currently working on my Master thesis.
My Master Thesis consists in implementing the k2-tree data structure in
C++ and one of the additional proposals for my thesis would be to
integrate my code with the boost graph library.I believe this
integration
would bring great value to the boost library and also it would provide a
better validation for my work.
So my question is if this would be something of interest for the library
to integrate with boost graph library.

INFO ABOUT K2-TREE:
The k2-tree have been proved successful to represent in a very compact
way different kinds of binary relations, such as web graphs, RDFs or
raster data.
Binary relations can be an abstraction to represent the relation between
the objects of two collections of different nature: graphs, trees,
strings, among others. They can be used in several low-level structures
within a more complex information retrieval system, or even replace one
of the most used one: an inverted index can be regarded as a binary
relation
between the vocabulary of terms and the documents where they appear.
Moreover,
it can represent the relation between terms and web pages, or even
social
networks or the connection between the web pages in the Web, which is
normally
represented as a web graph. k2-trees have been also used in other
scenarios,
such as geographical and RDF data, or images.

Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary
relations with applications." Information Systems 69 (2017): 106-123.
Coimbra, Miguel E., et al. "On dynamic succinct graph representations."
arXiv preprint arXiv:1911.03195 (2019).



Thank you in advance for your attention
Best regards,
Joana Hrotkó

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost Graph Library] Possible succint data structure addition

Boost - Dev mailing list
On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost <
[hidden email]> wrote:

> Dear Boost community,
>
> I am a Master's student and currently working on my Master thesis.
> My Master Thesis consists in implementing the k2-tree data structure in
> C++ and one of the additional proposals for my thesis would be to
> integrate my code with the boost graph library.I believe this
> integration
> would bring great value to the boost library and also it would provide a
> better validation for my work.
> So my question is if this would be something of interest for the library
> to integrate with boost graph library.
>
> INFO ABOUT K2-TREE:
> The k2-tree have been proved successful to represent in a very compact
> way different kinds of binary relations, such as web graphs, RDFs or
> raster data.
> Binary relations can be an abstraction to represent the relation between
> the objects of two collections of different nature: graphs, trees,
> strings, among others. They can be used in several low-level structures
> within a more complex information retrieval system, or even replace one
> of the most used one: an inverted index can be regarded as a binary
> relation
> between the vocabulary of terms and the documents where they appear.
> Moreover,
> it can represent the relation between terms and web pages, or even
> social
> networks or the connection between the web pages in the Web, which is
> normally
> represented as a web graph. k2-trees have been also used in other
> scenarios,
> such as geographical and RDF data, or images.
>
> Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary
> relations with applications." Information Systems 69 (2017): 106-123.
> Coimbra, Miguel E., et al. "On dynamic succinct graph representations."
> arXiv preprint arXiv:1911.03195 (2019).
>

Great subject for a thesis. Munro, the man of the beap
<https://github.com/pfalcon/beap> (a dynamic implicit data structure
<https://github.com/pfalcon/awesome-implicit-data-structures>). I don't
really understand why you would like a dynamic k2-tree, you can just as
well have a static one
<https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>
[Eytzinger-tree, the levels are implicit and the sorting is recursively
done, partially ( with std::nth_element, (in-place) sorting around the
median (the nth-element) ) along the way, getting more sorted on lower
levels, on my laptop I can easily run the recursion, without exhausting
stack-space, till my heap space starts to run out (with default stack size
settings)]  and copy the data on re-balancing (or re-balance the tree on
copying), as, in general, one insertion will completely change the tree,
due to the alternating dimensions, and the mechanics of updating is
out-performed very quickly by the recursive (in place) construction. Having
said that, I think there is still some (maybe a lot of) mileage into
exploring how these things can/should be well implemented in C++17.

Best of luck with your thesis.

degski
--
@realdegski
https://brave.com/google-gdpr-workaround/
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost Graph Library] Possible succint data structure addition

Boost - Dev mailing list
On Sun, 9 Feb 2020 at 07:18, degski <[hidden email]> wrote:

> On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost <
> [hidden email]> wrote:
>
>> Dear Boost community,
>>
>> I am a Master's student and currently working on my Master thesis.
>> My Master Thesis consists in implementing the k2-tree data structure in
>> C++ and one of the additional proposals for my thesis would be to
>> integrate my code with the boost graph library.I believe this
>> integration
>> would bring great value to the boost library and also it would provide a
>> better validation for my work.
>> So my question is if this would be something of interest for the library
>> to integrate with boost graph library.
>>
>> INFO ABOUT K2-TREE:
>> The k2-tree have been proved successful to represent in a very compact
>> way different kinds of binary relations, such as web graphs, RDFs or
>> raster data.
>> Binary relations can be an abstraction to represent the relation between
>> the objects of two collections of different nature: graphs, trees,
>> strings, among others. They can be used in several low-level structures
>> within a more complex information retrieval system, or even replace one
>> of the most used one: an inverted index can be regarded as a binary
>> relation
>> between the vocabulary of terms and the documents where they appear.
>> Moreover,
>> it can represent the relation between terms and web pages, or even
>> social
>> networks or the connection between the web pages in the Web, which is
>> normally
>> represented as a web graph. k2-trees have been also used in other
>> scenarios,
>> such as geographical and RDF data, or images.
>>
>> Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary
>> relations with applications." Information Systems 69 (2017): 106-123.
>> Coimbra, Miguel E., et al. "On dynamic succinct graph representations."
>> arXiv preprint arXiv:1911.03195 (2019).
>>
>
> Great subject for a thesis. Munro, the man of the beap
> <https://github.com/pfalcon/beap> (a dynamic implicit data structure
> <https://github.com/pfalcon/awesome-implicit-data-structures>). I don't
> really understand why you would like a dynamic k2-tree, you can just as
> well have a static one
> <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>
> [Eytzinger-tree, the levels are implicit and the sorting is recursively
> done, partially ( with std::nth_element, (in-place) sorting around the
> median (the nth-element) ) along the way, getting more sorted on lower
> levels, on my laptop I can easily run the recursion, without exhausting
> stack-space, till my heap space starts to run out (with default stack size
> settings)]  and copy the data on re-balancing (or re-balance the tree on
> copying), as, in general, one insertion will completely change the tree,
> due to the alternating dimensions, and the mechanics of updating is
> out-performed very quickly by the recursive (in place) construction. Having
> said that, I think there is still some (maybe a lot of) mileage into
> exploring how these things can/should be well implemented in C++17.
>
> Best of luck with your thesis.
>
> degski
> --
> @realdegski
> https://brave.com/google-gdpr-workaround/
> "We value your privacy, click here!" Sod off! - degski
> "Anyone who believes that exponential growth can go on forever in a finite
> world is either a madman or an economist" - Kenneth E. Boulding
> "Growth for the sake of growth is the ideology of the cancer cell" -
> Edward P. Abbey
>

I just realized, one can completely (and simply) hide the
copying/re-construction behind a dynamic (vector-like) interface, and the
dynamic implicit kd-tree is a fact. This is now on my TODO-list. KD-trees
of dimensions 1, 2 and 3 seem to be the norm (at 4 dimensions one runs
already in the curse of dimensionality, and linear search does better).

degski
--
@realdegski
https://brave.com/google-gdpr-workaround/
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost Graph Library] Possible succint data structure addition

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Sun, 9 Feb 2020 at 14:18, degski via Boost <[hidden email]> wrote:

> On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost <[hidden email]> wrote:
>
> > Dear Boost community,
> >
> > I am a Master's student and currently working on my Master thesis.
> > My Master Thesis consists in implementing the k2-tree data structure in
> > C++ and one of the additional proposals for my thesis would be to
> > integrate my code with the boost graph library.I believe this
> > integration
> > would bring great value to the boost library and also it would provide a
> > better validation for my work.
> > So my question is if this would be something of interest for the library
> > to integrate with boost graph library.
> >
> > INFO ABOUT K2-TREE:
> > The k2-tree have been proved successful to represent in a very compact
> > way different kinds of binary relations, such as web graphs, RDFs or
> > raster data.
> > Binary relations can be an abstraction to represent the relation between
> > the objects of two collections of different nature: graphs, trees,
> > strings, among others. They can be used in several low-level structures
> > within a more complex information retrieval system, or even replace one
> > of the most used one: an inverted index can be regarded as a binary
> > relation
> > between the vocabulary of terms and the documents where they appear.
> > Moreover,
> > it can represent the relation between terms and web pages, or even
> > social
> > networks or the connection between the web pages in the Web, which is
> > normally
> > represented as a web graph. k2-trees have been also used in other
> > scenarios,
> > such as geographical and RDF data, or images.
> >
> > Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary
> > relations with applications." Information Systems 69 (2017): 106-123.
> > Coimbra, Miguel E., et al. "On dynamic succinct graph representations."
> > arXiv preprint arXiv:1911.03195 (2019).
> >
>
> Great subject for a thesis. Munro, the man of the beap
> <https://github.com/pfalcon/beap> (a dynamic implicit data structure
> <https://github.com/pfalcon/awesome-implicit-data-structures>). I don't
> really understand why you would like a dynamic k2-tree, you can just as
> well have a static one
> <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>

By the way, I think k2-tree which is of Joana interest may be one of
the special cases
of the k-ary https://github.com/boostorg/graph/pull/139

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost Graph Library] Possible succint data structure addition

Boost - Dev mailing list
A 2020-02-09 20:09, Mateusz Loskot via Boost escreveu:

> On Sun, 9 Feb 2020 at 14:18, degski via Boost <[hidden email]>
> wrote:
>> On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost
>> <[hidden email]> wrote:
>>
>> > Dear Boost community,
>> >
>> > I am a Master's student and currently working on my Master thesis.
>> > My Master Thesis consists in implementing the k2-tree data structure in
>> > C++ and one of the additional proposals for my thesis would be to
>> > integrate my code with the boost graph library.I believe this
>> > integration
>> > would bring great value to the boost library and also it would provide a
>> > better validation for my work.
>> > So my question is if this would be something of interest for the library
>> > to integrate with boost graph library.
>> >
>> > INFO ABOUT K2-TREE:
>> > The k2-tree have been proved successful to represent in a very compact
>> > way different kinds of binary relations, such as web graphs, RDFs or
>> > raster data.
>> > Binary relations can be an abstraction to represent the relation between
>> > the objects of two collections of different nature: graphs, trees,
>> > strings, among others. They can be used in several low-level structures
>> > within a more complex information retrieval system, or even replace one
>> > of the most used one: an inverted index can be regarded as a binary
>> > relation
>> > between the vocabulary of terms and the documents where they appear.
>> > Moreover,
>> > it can represent the relation between terms and web pages, or even
>> > social
>> > networks or the connection between the web pages in the Web, which is
>> > normally
>> > represented as a web graph. k2-trees have been also used in other
>> > scenarios,
>> > such as geographical and RDF data, or images.
>> >
>> > Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary
>> > relations with applications." Information Systems 69 (2017): 106-123.
>> > Coimbra, Miguel E., et al. "On dynamic succinct graph representations."
>> > arXiv preprint arXiv:1911.03195 (2019).
>> >
>>
>> Great subject for a thesis. Munro, the man of the beap
>> <https://github.com/pfalcon/beap> (a dynamic implicit data structure
>> <https://github.com/pfalcon/awesome-implicit-data-structures>). I
>> don't
>> really understand why you would like a dynamic k2-tree, you can just
>> as
>> well have a static one
>> <https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp>
>
> By the way, I think k2-tree which is of Joana interest may be one of
> the special cases
> of the k-ary https://github.com/boostorg/graph/pull/139
>
> Best regards,

Hi

first of all thank you so much for the reply of all of you.

So the k2-tree is a succint data structure and it is equivalent to a web
graph,
so conceptually is somehow similar to the k-ary tree however,
implementation wise it is not
the same.

In the k2-tree it is meant to represent the relation between nodes in a
graph, based
on the matrix representation of the graph. In order to compress the
information in
the matrix (hopefully sparse), the method consists in subdividing the
binary matrix
into k rows and k columns, thus producing k2 submatrices of size n/k by
n/k. Each of
these submatriceswill be a child of the root node and its value will be
1 iff there
is at least one 1 in the cells of the submatrix. A 0 child means that
the submatrix
has all 0s and hence the tree decomposition ends there. Once the level
1, with the
children of the root, has been built, the method proceeds recursively
for each child
with value 1, until we reach submatrices full of 0s, or we reach the
cells of the
original matrix.
The k2-ary tree described is represented in a compact way with just two
bit
arrays: T (tree) and L(leaves). T stores all the bits of the k2-tree
except
those in the last level following a level-wise traversal. Bitmap L
stores the last
level of the tree.

 From the pull requests it seems there is no similar data structure to
the k2-tree
being implemented right now. However how hard is it to be able to
complete the pull request?
And how is the decision taken?

Best regards,
Joana

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost