[btree] Status report

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[btree] Status report

Beman Dawes
The slides for my BoostCon "Proposed Boost B-tree Library"
presentation are available:

http://github.com/boostcon/2011_presentations/raw/master/tue/proposed_b_tree_library.pdf

Please be aware that while the library is progressing nicely, it isn't
ready for production use:

* There are outstanding bugs, particularly with variable length data.

* Names and signatures may still change.

* Documentation is barely started.

* Build is failing on Mac OS X. That probably won't get fixed until
after BoostCon. (Reported by Marshall Clow)

That said, the response to the presentation was very positive and I'm
actively continuing development.

Anyone interested in the library is encouraged to take a look and give it a try.

See http://github.com/Beman/Boost-Btree/blob/master/README for install
instructions.

Feedback welcome!

Thanks,

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Rutger ter Borg-2
On 05/18/2011 08:54 PM, Beman Dawes wrote:
>
> Feedback welcome!
>
> Thanks,
>
> --Beman

Hello Beman,

looks interesting. I'm not able to find the bootstrap.sh file mentioned
in the README, it seems to be missing in the repository.

I have four remarks on the current state

1) I think having the in-memory representation to be identical to the
on-disk representation is a very serious limitation. Wrapping around it
by overloading the output stream operators doesn't seem efficient. I'm
not advocating Boost.Serialization; rather something like Asio's
buffer-handling mechanism (which, in turn, can handle Boost.Serialization).

2) using a "path" argument to having a file open presumes a real
underlying file. Although this is fine, maybe I would like to use an
fstream?

3) It would be nice if it would work with an asynchronous file-like
services, like, e.g., Asio's RandomAccessHandleService, see
http://www.boost.org/doc/libs/1_46_0/doc/html/boost_asio/reference/RandomAccessHandleService.html

4) have you looked at http://goo.gl/r8Z2m and/or http://goo.gl/lvc9A ?

Thanks,
Cheers,

Rutger


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

thorsten.ottosen
In reply to this post by Beman Dawes
Den 18-05-2011 20:54, Beman Dawes skrev:
> The slides for my BoostCon "Proposed Boost B-tree Library"
> presentation are available:
>
> http://github.com/boostcon/2011_presentations/raw/master/tue/proposed_b_tree_library.pdf

// boost::btree::
template <class Key, class T,
class Traits = default_endian_traits,
class Compare = btree::less<Key> >
class btree_map;

Would it makes sense to swap the Traits and Compare argument? Isn't it
more common to change the predicate?

-Thorsten
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

thorsten.ottosen
In reply to this post by Beman Dawes
Den 18-05-2011 20:54, Beman Dawes skrev:
> The slides for my BoostCon "Proposed Boost B-tree Library"
> presentation are available:
>
> http://github.com/boostcon/2011_presentations/raw/master/tue/proposed_b_tree_library.pdf

A comment more. In

for (map_type::iterator it = bt_map.begin();
it != bt_map.end(); ++it)
{
cout << " " << it->key() << " --> "
<< it->mapped_value() << '\n';
}

I got lot of negative feedback when I was not interface compatible with
existing containers in Boost.PtrContainer. Thus, we made the syntax
i->first and i->second availble without using std::pair so at least
generic algorithms would work.

Boost.Bimap does something similar. Would it not be possible to do
something similar, even though the value is fetched lazily?


-Thorsten
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Beman Dawes
In reply to this post by Rutger ter Borg-2
On Thu, May 19, 2011 at 3:24 AM, Rutger ter Borg <[hidden email]> wrote:

> On 05/18/2011 08:54 PM, Beman Dawes wrote:
>>
>> Feedback welcome!
>>
>> Thanks,
>>
>> --Beman
>
> Hello Beman,
>
> looks interesting. I'm not able to find the bootstrap.sh file mentioned in
> the README, it seems to be missing in the repository.

It comes from the Boost repo when the install instructions are applied.

> I have four remarks on the current state
>
> 1) I think having the in-memory representation to be identical to the
> on-disk representation is a very serious limitation.

Yes, agreed. But the assumption is that it is very, very, fast
compared to an implementation that must translate between different
external and internal representations.

The limitations can be overcome by using the B-tree within the
implementation of some higher level abstraction.

> Wrapping around it by
> overloading the output stream operators doesn't seem efficient.

That's also my assumption.

> I'm not
> advocating Boost.Serialization; rather something like Asio's buffer-handling
> mechanism (which, in turn, can handle Boost.Serialization).

I'm not familiar with that mechanism. Chis is here at BoostCon so I'll ask him.

> 2) using a "path" argument to having a file open presumes a real underlying
> file. Although this is fine, maybe I would like to use an fstream?

What would a real-world use case be?

> 3) It would be nice if it would work with an asynchronous file-like
> services, like, e.g., Asio's RandomAccessHandleService, see
> http://www.boost.org/doc/libs/1_46_0/doc/html/boost_asio/reference/RandomAccessHandleService.html

I'd need stronger motivation that "it would be nice"! What would the benefit be?

> 4) have you looked at http://goo.gl/r8Z2m

Yes, quickly. It appears the proposed Boost B-tree's caching scheme
and pack optimization provide similar benefits.

 and/or http://goo.gl/lvc9A ?

I find that one more interesting and will study it more carefully. But
the proposed containers have highly tunable performance, and were
designed with many of the same concerns of that paper in mind WRT disk
access versus cache lines.

Thanks,

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Beman Dawes
In reply to this post by thorsten.ottosen
On Thu, May 19, 2011 at 5:40 AM, Thorsten Ottosen
<[hidden email]> wrote:

> Den 18-05-2011 20:54, Beman Dawes skrev:
>>
>> The slides for my BoostCon "Proposed Boost B-tree Library"
>> presentation are available:
>>
>>
>> http://github.com/boostcon/2011_presentations/raw/master/tue/proposed_b_tree_library.pdf
>
> // boost::btree::
> template <class Key, class T,
> class Traits = default_endian_traits,
> class Compare = btree::less<Key> >
> class btree_map;
>
> Would it makes sense to swap the Traits and Compare argument?

I've gone back and forth on that. The argument that may sway me is
trying to stay as close as possible to standard library containers.

> Isn't it more
> common to change the predicate?

I'm not sure. Some folks want the highest possible performance, don't
care about portability, and are willing to accept alignment
limitations. Many others want some combination of those. That implies
a lot of use of the Traits parameter.

But the current ordering does seem to be a bit of a lightning rod!

Thanks,

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Beman Dawes
In reply to this post by thorsten.ottosen
On Thu, May 19, 2011 at 5:49 AM, Thorsten Ottosen
<[hidden email]> wrote:

> Den 18-05-2011 20:54, Beman Dawes skrev:
>>
>> The slides for my BoostCon "Proposed Boost B-tree Library"
>> presentation are available:
>>
>>
>> http://github.com/boostcon/2011_presentations/raw/master/tue/proposed_b_tree_library.pdf
>
> A comment more. In
>
> for (map_type::iterator it = bt_map.begin();
> it != bt_map.end(); ++it)
> {
> cout << " " << it->key() << " --> "
> << it->mapped_value() << '\n';
> }
>
> I got lot of negative feedback when I was not interface compatible with
> existing containers in Boost.PtrContainer. Thus, we made the syntax
> i->first and i->second availble without using std::pair so at least generic
> algorithms would work.

Interesting! This is the interface design choice I'm most worried
about, and I'm willing to make even painful changes to the current
approach if convinced a different approach is better.

The feedback I got from the BoostCon audience was that they really
like the explicit names, although several people thought that
"mapped_value" should be renamed to "value".

Iterators current contain two pointers. I've worried about any
approach that would increase the size.

But I had not cosidered the impact on algorithms that expect the
"first" and "second" names, that further complicates the decision.

>
> Boost.Bimap does something similar. Would it not be possible to do something
> similar, even though the value is fetched lazily?

I'll look at Bimap. Thanks for the pointer.

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Beman Dawes
In reply to this post by Beman Dawes
> * Build is failing on Mac OS X. That probably won't get fixed until
> after BoostCon. (Reported by Marshall Clow)

The fix, a Boost.Iostream problem, has been fixed in trunk.

Again, thanks to Marshall for verifying the fix solved the problem.

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Rutger ter Borg-2
In reply to this post by Beman Dawes

On 2011-05-19 20:07, Beman Dawes wrote:
>
> Yes, agreed. But the assumption is that it is very, very, fast
> compared to an implementation that must translate between different
> external and internal representations.

I always assumed that file-access is orders of magnitude slower than
DMA. Especially when the file isn't local. Another case might be
convincing here: Keys that don't require translation, Mapped that
requires it.

>> 2) using a "path" argument to having a file open presumes a real underlying
>> file. Although this is fine, maybe I would like to use an fstream?
>
> What would a real-world use case be?

See below.

>> 3) It would be nice if it would work with an asynchronous file-like
>> services, like, e.g., Asio's RandomAccessHandleService, see
>> http://www.boost.org/doc/libs/1_46_0/doc/html/boost_asio/reference/RandomAccessHandleService.html
>
> I'd need stronger motivation that "it would be nice"! What would the benefit be?

There's an increased availability of network-based storage mechanisms,
like, e.g., Amazon's S3. The combination of (such a) storage backend
with a B-tree container delivers a NoSQL-stack, which I think is a
considerable benefit.

To name a concrete example, using a distributed object store like Ceph's
underlying RADOS, see http://ceph.newdream.net/wiki/Librados , as a
storage-backend. Operations at this level don't require  "expensive"
file-system semantics.

Cheers,

Rutger


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Beman Dawes
On Fri, May 20, 2011 at 4:46 AM, Rutger ter Borg <[hidden email]> wrote:

>
> On 2011-05-19 20:07, Beman Dawes wrote:
>>
>> Yes, agreed. But the assumption is that it is very, very, fast
>> compared to an implementation that must translate between different
>> external and internal representations.
>
> I always assumed that file-access is orders of magnitude slower than DMA.
> Especially when the file isn't local. Another case might be convincing here:
> Keys that don't require translation, Mapped that requires it.
>
>>> 2) using a "path" argument to having a file open presumes a real
>>> underlying
>>> file. Although this is fine, maybe I would like to use an fstream?
>>
>> What would a real-world use case be?
>
> See below.
>
>>> 3) It would be nice if it would work with an asynchronous file-like
>>> services, like, e.g., Asio's RandomAccessHandleService, see
>>>
>>> http://www.boost.org/doc/libs/1_46_0/doc/html/boost_asio/reference/RandomAccessHandleService.html
>>
>> I'd need stronger motivation that "it would be nice"! What would the
>> benefit be?
>
> There's an increased availability of network-based storage mechanisms, like,
> e.g., Amazon's S3. The combination of (such a) storage backend with a B-tree
> container delivers a NoSQL-stack, which I think is a considerable benefit.

Yes, of course. And such a stack might want to provide async
operational functions. But they would be provided by a level above the
basic B-tree package.

Chris Kohlhoff and I discussed this yesterday. Neither of us can see
any way for Boost B-tree to benefit from async approaches.

A UB-tree (i.e. a B-tree that does multi-dimentional searches) might
well benefit. But that isn't what is on the table at the moment.
>
> To name a concrete example, using a distributed object store like Ceph's
> underlying RADOS, see http://ceph.newdream.net/wiki/Librados , as a
> storage-backend. Operations at this level don't require  "expensive"
> file-system semantics.

I'll take a look.

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Matias Capeletto
In reply to this post by Beman Dawes
Hi Beman,

On Thu, May 19, 2011 at 7:25 PM, Beman Dawes <[hidden email]> wrote:
> But I had not cosidered the impact on algorithms that expect the
> "first" and "second" names, that further complicates the decision.

If it is possible, I think it is a good idea to at least offer the
standard "first" and "second" names as an alternative option.

> On Thu, May 19, 2011 at 5:49 AM, Thorsten Ottosen
> <[hidden email]> wrote:
>> Boost.Bimap does something similar. Would it not be possible to do something
>> similar, even though the value is fetched lazily?
>
> I'll look at Bimap. Thanks for the pointer.

I leave you here a direct pointer to the rationale for Boost.Bimap
design decisions:

  http://tinyurl.com/bimap-rationale

Best regards,
Matias
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

Cory Nelson
In reply to this post by Beman Dawes
On Thu, May 19, 2011 at 11:07 AM, Beman Dawes <[hidden email]> wrote:
> On Thu, May 19, 2011 at 3:24 AM, Rutger ter Borg <[hidden email]> wrote:
>> 3) It would be nice if it would work with an asynchronous file-like
>> services, like, e.g., Asio's RandomAccessHandleService, see
>> http://www.boost.org/doc/libs/1_46_0/doc/html/boost_asio/reference/RandomAccessHandleService.html
>
> I'd need stronger motivation that "it would be nice"! What would the benefit be?
>

Real-world situation:

I've got a fuzzy full-text search running on top of my own async
b-tree code right now.

Searching needs to hit the b-tree several times depending on the
length of the query.  Being able to launch all these b-tree queries in
parallel is very important for performance, but launching one thread
for each query is tremendously sub-optimal.  In a server environment
that processes hundreds of these full-text searches at once, it might
mean asking for thousands of threads!  With async I can do all of this
very efficiently using only a single thread (or any number of threads
of my choosing).

This said, async will significantly increase the amount of code needed
to implement this library.  I wouldn't want it to be delayed for it,
especially when most people won't need async.

If it could be designed in a way that decouples the core bits from I/O
as much as possible so that it would be easy to move over later, that
would be cool.  ie. implement operations with simple I/O-agnostic
state machines -- instead of doing any potentially blocking operation
like filling buffers directly, it requests the user of the state
machine to fill the buffer for it and resume processing afterward.  A
blocking lower_bound() might be implemented something like:

lower_bound_op sm(key)

result r;

while((r = sm.run()) != done)
{
if(r == ...) ...
if(r == need_block) fill_buffer(sm.buf, sm.buf_length);
}

--
Cory Nelson
http://int64.org
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [btree] Status report

thorsten.ottosen
In reply to this post by Matias Capeletto
Den 21-05-2011 12:28, Matias Capeletto skrev:
> Hi Beman,
>
>> On Thu, May 19, 2011 at 5:49 AM, Thorsten Ottosen
>> <[hidden email]>  wrote:
>>> Boost.Bimap does something similar. Would it not be possible to do something
>>> similar, even though the value is fetched lazily?
>>
>> I'll look at Bimap. Thanks for the pointer.

In PtrContainer we cache the underlying pair to
adjust the interface:

         template< class F, class S >
         struct ref_pair
         {
             typedef F first_type;
             typedef S second_type;

             const F& first;
             S        second;

             template< class F2, class S2 >
             ref_pair( const std::pair<F2,S2>& p )
             : first(p.first), second(static_cast<S>(p.second))
             { }

             template< class RP >
             ref_pair( const RP* rp )
             : first(rp->first), second(rp->second)
             { }

             const ref_pair* const operator->() const
             {
                 return this;
             }

             ...
         };

Alternatively, it might be slightly more efficient to
provide cnoversion operations for dummy objects
called first/second. But I doubt it will have any significant
overhead in the btree case as I suspect that iteration
is already slower than normal in-memory datas tructures.

-Thorsten
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost