[flat_set] When to sort?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[flat_set] When to sort?

Boost - Dev mailing list
I've been using boost::container::flat_set with quite a success until
recently when I profiled my app to see that 30-50% of the run-time was
spent in insert() for a container of around 50,000 in size.

So, I had to replace boost::container::flat_set with a vector variation
which inserts as a vector (no sorting). However, when it is the time to
call begin(), end(), find(), it sorts once. That shaved the mentioned
overhead off. Say, one run is down from 17s to 10s and the likes.

I wonder if that should be the boost::container::flat_set default
behavior rather than sorting the underlying container with every
insert(). After all, as a user I do not care if the underlying container
is sorted... until I start using/traversing it, right? Or maybe I am
missing something in boost::container::flat_set which already allows me
to do something of that sort?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
On 03/26/17 06:02, Vladimir Batov via Boost wrote:
> I've been using boost::container::flat_set with quite a success until
> recently when I profiled my app to see that 30-50% of the run-time was
> spent in insert() for a container of around 50,000 in size.
>
> So, I had to replace boost::container::flat_set with a vector variation
> which inserts as a vector (no sorting). However, when it is the time to
> call begin(), end(), find(), it sorts once. That shaved the mentioned
> overhead off. Say, one run is down from 17s to 10s and the likes.

AFAIU, flat_set should not sort on insert. It should do lower_bound to
find the insert position, which is O(log(N)). Sorting instead would give
O(N*log(N)).

What you're probably seeing is the overhead that comes from inserting in
the middle of the vector that is used by flat_set internally. This
especially matters if elements have inefficient move/copy. Flat
containers are not well suited for frequent modifications, so that is
kind of expected.

> I wonder if that should be the boost::container::flat_set default
> behavior rather than sorting the underlying container with every
> insert(). After all, as a user I do not care if the underlying container
> is sorted... until I start using/traversing it, right?

No, this won't work in general because these methods are not supposed to
modify the container. In particular, they don't invalidate iterators or
references, and calling these methods concurrently is thread-safe.

> Or maybe I am
> missing something in boost::container::flat_set which already allows me
> to do something of that sort?

You can try using the methods and constructors that accept the
ordered_unique_range tag. If you can prepare a sequence of ordered and
unique elements, you can call insert(ordered_unique_range, s.begin(),
s.end()) or similarly construct the flat_set to avoid ordering the
sorted elements.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:

> On 03/26/17 06:02, Vladimir Batov via Boost wrote:
>> I've been using boost::container::flat_set with quite a success until
>> recently when I profiled my app to see that 30-50% of the run-time was
>> spent in insert() for a container of around 50,000 in size.
>>
>> So, I had to replace boost::container::flat_set with a vector
>> variation
>> which inserts as a vector (no sorting). However, when it is the time
>> to
>> call begin(), end(), find(), it sorts once. That shaved the mentioned
>> overhead off. Say, one run is down from 17s to 10s and the likes.
>
> AFAIU, flat_set should not sort on insert. It should do lower_bound to
> find the insert position, which is O(log(N)). Sorting instead would
> give O(N*log(N)).

Yes, indeed. I expressed myself incorrectly. I did not mean sorting as
such. My apologies.

> What you're probably seeing is the overhead that comes from inserting
> in the middle of the vector that is used by flat_set internally. This
> especially matters if elements have inefficient move/copy.

Yes, I initially thought so too... but the value_type is merely a pair
of pointers. The comparison operation is expensive. So, it appears it
plays quite a role during initial construction. When I eliminated that
as described and sorted once... huge boost.

> Flat
> containers are not well suited for frequent modifications, so that is
> kind of expected.

Understood. The described use-case constructs the flat_set only once
(but a big one and one insert at a time). So, on the surface all
appeared by the book -- constructed once, traversed/searched often.
Still, the observed construction overhead really stunned me.

>> I wonder if that should be the boost::container::flat_set default
>> behavior rather than sorting the underlying container with every
>> insert(). After all, as a user I do not care if the underlying
>> container
>> is sorted... until I start using/traversing it, right?
>
> No, this won't work in general because these methods are not supposed
> to modify the container. In particular, they don't invalidate
> iterators or references, and calling these methods concurrently is
> thread-safe.

I do not feel those begin() et al will invalidate anything because the
first call to, say, begin() will sort the underlying container once.
After that it all works as usual, right?

>> Or maybe I am
>> missing something in boost::container::flat_set which already allows
>> me
>> to do something of that sort?
>
> You can try using the methods and constructors that accept the
> ordered_unique_range tag. If you can prepare a sequence of ordered and
> unique elements, you can call insert(ordered_unique_range, s.begin(),
> s.end()) or similarly construct the flat_set to avoid ordering the
> sorted elements.

Yes, I am aware of ordered_unique_range tag. Unfortunately, (as you
described) it is getting messier and messier -- populate another sorted
container, then transfer to flat_set. My suggestion seems like an
improvement.






_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
On 03/26/17 14:00, Vladimir Batov via Boost wrote:

> On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
>
>> What you're probably seeing is the overhead that comes from inserting
>> in the middle of the vector that is used by flat_set internally. This
>> especially matters if elements have inefficient move/copy.
>
> Yes, I initially thought so too... but the value_type is merely a pair
> of pointers. The comparison operation is expensive. So, it appears it
> plays quite a role during initial construction. When I eliminated that
> as described and sorted once... huge boost.

Well, if the ordering predicate plays the defining role, I don't quite
see where your gain is. Inserting N elements in an empty flat_set or
inserting N elements in a vector and then sorting would give you the
same number of calls to the predicate, unless I'm missing something.

I still think that moving around lots of pointers on every insertion
(you said the end size was ~50000 elements and I'm assuming normal
distribution of insert positions) is the culprit. E.g. towards the end
of construction inserting an element in the middle would have to move
50000 * 8 = ~390 KiB of data, which is quite a lot. Did you try
profiling the code?

>> No, this won't work in general because these methods are not supposed
>> to modify the container. In particular, they don't invalidate
>> iterators or references, and calling these methods concurrently is
>> thread-safe.
>
> I do not feel those begin() et al will invalidate anything because the
> first call to, say, begin() will sort the underlying container once.
> After that it all works as usual, right?

No, not quite.

   flat_set< int > fs;
   fs.reserve(5);
   auto it1 = fs.begin();
   auto it2 = fs.insert(42).first;
   auto it3 = f2.begin(); // invalidates it1 and it2

Also, as I mentioned, multiple threads may me calling begin() or find()
concurrently. These methods must also have const-qualified versions to
be able to be used on non-mutable containers.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Hi,


>> Flat
>> containers are not well suited for frequent modifications, so that is
>> kind of expected.
>
> Understood. The described use-case constructs the flat_set only once
> (but a big one and one insert at a time). So, on the surface all
> appeared by the book -- constructed once, traversed/searched often.
> Still, the observed construction overhead really stunned me.

I think the issue is that you use the wrong insert method. a single
insert is O(N)in the size of the container (worst case: insert sorted
from largest to smallest element), so inserting K element into an empty
flat set has runtime O(K^2). You are essentially doing insertion sort on
your data.

flat_set has range-insert:

template<typename InputIterator> void insert(InputIterator,
InputIterator);
template<typename InputIterator>
void insert(ordered_unique_range_t, InputIterator, InputIterator);

inserting a range of size K in a flat_set of size N should have
complexity O(K log(K) + K+N) in the first case (implementable as: copy
all new elements to the end, sort the new elements and perform an
inplace-merge). in the second call, only a simple merge is needed,
runtime O(K+N).

Best,
Oswin

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
Oswin Krause wrote:
...
> I think the issue is that you use the wrong insert method. a single insert
> is O(N)...

> flat_set has range-insert:
...
> inserting a range of size K in a flat_set of size N should have complexity
> O(K log(K) + K+N) in the first case (implementable as: copy all new
> elements to the end, sort the new elements and perform an inplace-merge).

In the use case the elements may come one by one. To use range insert,
you'll have to buffer them until lookup is needed, then insert them at once.

This is exactly what Vladimir has been suggesting that flat_set ought to do
itself. Insert elements at the end, when lookup is needed, merge them into
place.

There is another variation of this approach, one I'm sure we've discussed
before: the flat_set keeps the unsorted range at the end and on lookup tries
both, first the unsorted part, then the sorted one. The unsorted part is
merged into place when it exceeds some size. This doesn't do non-const
operations on logically const lookups, but iteration becomes problematic if
iterators have to give you a sorted view. (Although you could keep the
suffix part sorted as well and merge on demand in iter::op++ if so
inclined.)


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
Hi,


> There is another variation of this approach, one I'm sure we've
> discussed before: the flat_set keeps the unsorted range at the end and
> on lookup tries both, first the unsorted part, then the sorted one.
> The unsorted part is merged into place when it exceeds some size. This
> doesn't do non-const operations on logically const lookups, but
> iteration becomes problematic if iterators have to give you a sorted
> view. (Although you could keep the suffix part sorted as well and
> merge on demand in iter::op++ if so inclined.)

This would have consequences both in asymptotic run time (lookups are
O(log(N)+K) where K is the maximum allowed size of the unsorted region)
as well as constants (more complex code, more ifs etc). If K is
independent of N, we would still have overall quadratic time of
inserting a lot of elements using insert(single_element). So this does
not help.

What I meant with my previous comment: it is probably the easiest way
just to buffer all elements to be inserted in an intermediate storage
area and then insert them at once. Of course we might not want to use
the current insert for this as this would require storing all new
elements. Here is my approach to this:

flat_set maintains an array and three iterators(set_begin, set_end and
vector_end). the range [set_begin,set_end) contains all inserted
elements and size() begin(), end() will only return this range. the
range [set_end, vector_end) contains all newly inserted elements,
however they´are still "staging" and will not show up unless the
flat_set is told to.

so we would add two new methods

staging_insert(single_element)//pushes element to the back of the
staging area
sort_staging()//calls sort() on the old range. afterwards
set_end=staging_end.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 03/26/2017 10:35 PM, Andrey Semashev via Boost wrote:

> On 03/26/17 14:00, Vladimir Batov via Boost wrote:
>> On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
>>
>>> What you're probably seeing is the overhead that comes from inserting
>>> in the middle of the vector that is used by flat_set internally. This
>>> especially matters if elements have inefficient move/copy.
>>
>> Yes, I initially thought so too... but the value_type is merely a pair
>> of pointers. The comparison operation is expensive. So, it appears it
>> plays quite a role during initial construction. When I eliminated that
>> as described and sorted once... huge boost.
>
> Well, if the ordering predicate plays the defining role, I don't quite
> see where your gain is. Inserting N elements in an empty flat_set or
> inserting N elements in a vector and then sorting would give you the
> same number of calls to the predicate, unless I'm missing something.

I must have failed to describe my use-case as it is not "inserting N
elements in an empty flat_set". In fact, it is rather inserting quite a
number (about 50,000 to 200,000 so far) of value_type instances on
one-by-one basis.

> I still think that moving around lots of pointers on every insertion
> (you said the end size was ~50000 elements and I'm assuming normal
> distribution of insert positions) is the culprit. E.g. towards the end
> of construction inserting an element in the middle would have to move
> 50000 * 8 = ~390 KiB of data, which is quite a lot.

It might well be. I did not profile what exactly contributes so much
during flat_set construction. I simply chopped the whole construction
off (replaced with straight emplace_back) to see huge 30-50% improvement.

> Did you try profiling the code?

:-) You are just teasing me, right? :-)


>
>>> No, this won't work in general because these methods are not supposed
>>> to modify the container. In particular, they don't invalidate
>>> iterators or references, and calling these methods concurrently is
>>> thread-safe.
>>
>> I do not feel those begin() et al will invalidate anything because the
>> first call to, say, begin() will sort the underlying container once.
>> After that it all works as usual, right?
>
> No, not quite.
>
>   flat_set< int > fs;
>   fs.reserve(5);
>   auto it1 = fs.begin();
>   auto it2 = fs.insert(42).first;
>   auto it3 = f2.begin(); // invalidates it1 and it2

Well, I do understand that insertion into a std::vector invalidates the
iterators. :-) I thought you argued (when you said "No, this won't work
in general because...") that "my" approach (of only sorting on
when-needed basis) had different to the flat_set behavior. I argued that
it was the same. That is, no changes in behavior but a huge construction
performance boost. That fact should not probably be simply ignored.

> Also, as I mentioned, multiple threads may me calling begin() or
> find() concurrently. These methods must also have const-qualified
> versions to be able to be used on non-mutable containers.

I can't see any of it as a fundamental problem. They all addressed in
implementation:


template<typename T, typename Sort>
struct aux::sorted_vector
{ ...
     iterator       begin ()       { sort_(); return all_.begin(); }
     iterator         end ()       { sort_(); return all_.end(); }
     const_iterator begin () const { sort_(); return all_.begin(); }
     const_iterator   end () const { sort_(); return all_.end(); }
     ...

the sort_() takes care of sorting (if/when needed) and MT safety.






_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

On 03/27/2017 06:03 AM, Oswin Krause via Boost wrote:

>
>> There is another variation of this approach, one I'm sure we've
>> discussed before: the flat_set keeps the unsorted range at the end and
>> on lookup tries both, first the unsorted part, then the sorted one.
>> The unsorted part is merged into place when it exceeds some size. This
>> doesn't do non-const operations on logically const lookups, but
>> iteration becomes problematic if iterators have to give you a sorted
>> view. (Although you could keep the suffix part sorted as well and
>> merge on demand in iter::op++ if so inclined.)
>
> This would have consequences both in asymptotic run time (lookups are
> O(log(N)+K) where K is the maximum allowed size of the unsorted
> region) as well as constants (more complex code, more ifs etc). If K
> is independent of N, we would still have overall quadratic time of
> inserting a lot of elements using insert(single_element). So this does
> not help.
>
> What I meant with my previous comment: it is probably the easiest way
> just to buffer all elements to be inserted in an intermediate storage
> area and then insert them at once.

I am sorry but I have to disagree on this one. I can't possibly see it
as the "easiest way". My use-case (which I do not believe is exotic by
any stretch of imagination) goes around and cherry-picks the elements
essentially one-by-one. So, having another container to populate feels
like a huge hassle. I thought the whole idea of flat_set was to get the
benefits without the hard work. :-) If I have to have such a container
to begin with, I might not need flat_set at all as I'd be then
transferring straight into std::vector, right?

> Of course we might not want to use the current insert for this as this
> would require storing all new elements. Here is my approach to this:
>
> flat_set maintains an array and three iterators(set_begin, set_end and
> vector_end). the range [set_begin,set_end) contains all inserted
> elements and size() begin(), end() will only return this range. the
> range [set_end, vector_end) contains all newly inserted elements,
> however they´are still "staging" and will not show up unless the
> flat_set is told to.
>
> so we would add two new methods
>
> staging_insert(single_element)//pushes element to the back of the
> staging area
> sort_staging()//calls sort() on the old range. afterwards
> set_end=staging_end.

That feels quite complex... what is wrong with my (seemingly) simple
suggestion of sorting on when-needed basis?


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
<[hidden email]> wrote:

> On 03/26/2017 10:35 PM, Andrey Semashev via Boost wrote:
>>
>> Well, if the ordering predicate plays the defining role, I don't quite see
>> where your gain is. Inserting N elements in an empty flat_set or inserting N
>> elements in a vector and then sorting would give you the same number of
>> calls to the predicate, unless I'm missing something.
>
> I must have failed to describe my use-case as it is not "inserting N
> elements in an empty flat_set". In fact, it is rather inserting quite a
> number (about 50,000 to 200,000 so far) of value_type instances on
> one-by-one basis.

I don't see how that changes anything.

>>>> No, this won't work in general because these methods are not supposed
>>>> to modify the container. In particular, they don't invalidate
>>>> iterators or references, and calling these methods concurrently is
>>>> thread-safe.
>>>
>>> I do not feel those begin() et al will invalidate anything because the
>>> first call to, say, begin() will sort the underlying container once.
>>> After that it all works as usual, right?
>>
>> No, not quite.
>>
>>   flat_set< int > fs;
>>   fs.reserve(5);

I've made a mistake here in my example, the following line is missing:

  fs.insert(10);

>>   auto it1 = fs.begin();
>>   auto it2 = fs.insert(42).first;
>>   auto it3 = f2.begin(); // invalidates it1 and it2
>
> Well, I do understand that insertion into a std::vector invalidates the
> iterators. :-) I thought you argued (when you said "No, this won't work in
> general because...") that "my" approach (of only sorting on when-needed
> basis) had different to the flat_set behavior.

I did, and the example above shows where your proposed change breaks
the previously valid code.

> I argued that it was the
> same. That is, no changes in behavior but a huge construction performance
> boost. That fact should not probably be simply ignored.

I'm not opposed to the idea of optimizing your use case. I just don't
think flat_set is the right tool for it.

The concept of self-organizing containers (including a
staging/scratchbook area) is not new, there is a precedent in
Boost.Intrusive at least. But those tricks typically don't work well
with certain common properties of traditional containers, such as the
ones I mentioned. For this reason it is best to provide these new
features under new names. IOW, propose a new container instead of
subverting behavior of an existing one.

>> Also, as I mentioned, multiple threads may me calling begin() or find()
>> concurrently. These methods must also have const-qualified versions to be
>> able to be used on non-mutable containers.
>
> I can't see any of it as a fundamental problem. They all addressed in
> implementation:
>
> template<typename T, typename Sort>
> struct aux::sorted_vector
> { ...
>     iterator       begin ()       { sort_(); return all_.begin(); }
>     iterator         end ()       { sort_(); return all_.end(); }
>     const_iterator begin () const { sort_(); return all_.begin(); }
>     const_iterator   end () const { sort_(); return all_.end(); }
>     ...
>
> the sort_() takes care of sorting (if/when needed) and MT safety.

How does sort_() handle thread safety? Are you proposing to add a
mutex to the container?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list

On 03/27/2017 09:20 AM, Andrey Semashev wrote:

> On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
> <[hidden email]> wrote:
>
>>>>> No, this won't work in general because these methods are not supposed
>>>>> to modify the container. In particular, they don't invalidate
>>>>> iterators or references, and calling these methods concurrently is
>>>>> thread-safe.
>>>> I do not feel those begin() et al will invalidate anything because the
>>>> first call to, say, begin() will sort the underlying container once.
>>>> After that it all works as usual, right?
>>> No, not quite.
>>>
>>>    flat_set< int > fs;
>>>    fs.reserve(5);
>>>    fs.insert(10);
>>>    auto it1 = fs.begin();
>>>    auto it2 = fs.insert(42).first;
>>>    auto it3 = f2.begin(); // invalidates it1 and it2
>> Well, I do understand that insertion into a std::vector invalidates the
>> iterators. I thought you argued (when you said "No, this won't work in
>> general because...") that "my" approach (of only sorting on when-needed
>> basis) had different to the flat_set behavior.
> I did, and the example above shows where your proposed change breaks
> the previously valid code.

I have to disagree with your "proposed change breaks the previously
valid code" statement. Indeed, given we know the implementation of
flat_set, we can say that the above code does work with the current
flat_set implementation. However, the code is not guaranteed to work
with the current flat_set.  I just re-read flat_set documentation and it
does not provide such guarantees for emplace() and insert(). IMO there
is a difference between happens-to-work and guaranteed/valid code. Don't
you agree?

> ...
>>> Also, as I mentioned, multiple threads may me calling begin() or find()
>>> concurrently. These methods must also have const-qualified versions
>>> to be
>>> able to be used on non-mutable containers.
>> I can't see any of it as a fundamental problem. They all addressed in
>> implementation:
>>
>> template<typename T, typename Sort>
>> struct aux::sorted_vector
>> { ...
>>      iterator       begin ()       { sort_(); return all_.begin(); }
>>      iterator         end ()       { sort_(); return all_.end(); }
>>      const_iterator begin () const { sort_(); return all_.begin(); }
>>      const_iterator   end () const { sort_(); return all_.end(); }
>>      ...
>>
>> the sort_() takes care of sorting (if/when needed) and MT safety.
> How does sort_() handle thread safety? Are you proposing to add a
> mutex to the container?

Yes, I am/was... Is it a problem? I can't see it being a bottle neck as
it all will be optimized inside sort_() (no unnecessary locks, etc.).

In the end I am not trying to say flat_set is broken or anything. My
point is really simple. IMO I was using flat_set in a fairly
conventional way and incurred an unexpected and considerable
construction overhead. So much so that I had to replace flat_set. I
personally find it unfortunate. I have quite a few more places where I
use flat_set in a similar setting -- a lengthy elaborate construction
followed by much lengthier deployment. My use-case seems fairly
straightforward... but I had to replace flat_set. If so, then I am
questioning now where I might need flat_sets? That feels hugely
unfortunate that a whole chunk of Boost functionality/code (seemingly
designed for this very purpose) can't be used. Isn't it a concern? Don't
you agree?


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
On Mon, Mar 27, 2017 at 2:03 AM, Vladimir Batov via Boost
<[hidden email]> wrote:

>
> On 03/27/2017 09:20 AM, Andrey Semashev wrote:
>>
>> On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
>> <[hidden email]> wrote:
>>
>>>>> I do not feel those begin() et al will invalidate anything because the
>>>>> first call to, say, begin() will sort the underlying container once.
>>>>> After that it all works as usual, right?
>>>>
>>>> No, not quite.
>>>>
>>>>    flat_set< int > fs;
>>>>    fs.reserve(5);
>>>>    fs.insert(10);
>>>>    auto it1 = fs.begin();
>>>>    auto it2 = fs.insert(42).first;
>>>>    auto it3 = f2.begin(); // invalidates it1 and it2
>>>
>>> Well, I do understand that insertion into a std::vector invalidates the
>>> iterators. I thought you argued (when you said "No, this won't work in
>>> general because...") that "my" approach (of only sorting on when-needed
>>> basis) had different to the flat_set behavior.
>>
>> I did, and the example above shows where your proposed change breaks
>> the previously valid code.
>
>
> I have to disagree with your "proposed change breaks the previously valid
> code" statement. Indeed, given we know the implementation of flat_set, we
> can say that the above code does work with the current flat_set
> implementation. However, the code is not guaranteed to work with the current
> flat_set.  I just re-read flat_set documentation and it does not provide
> such guarantees for emplace() and insert(). IMO there is a difference
> between happens-to-work and guaranteed/valid code. Don't you agree?

Hmm, I've just checked the flat_set reference [1], and this is what it says:

<quote>
flat_set is similar to std::set but it's implemented like an ordered
vector. This means that inserting a new element into a flat_set
invalidates previous iterators and references
</quote>

It does say it invalidates iterators (BTW, the word "previous" here is
misleading; at first I interpreted it as "iterators pointing to
previous elements") but it also says the container works like a
vector. For vector the standard guarantees that if no reallocation
happens on insert then iterators pointing to the elements before the
insertion position are not invalidated ([vector.modifiers]/1). I think
the flat_set documentation is inaccurate and needs to be corrected.

>> How does sort_() handle thread safety? Are you proposing to add a
>> mutex to the container?
>
> Yes, I am/was... Is it a problem? I can't see it being a bottle neck as it
> all will be optimized inside sort_() (no unnecessary locks, etc.).

I'm sorry, but that is a no-go. Too often containers are used with
external locking, and internal locking would just waste performance,
especially in such a hot spot as begin(). BTW, I don't believe any
compiler is able to optimize away unnecessary locking (thank god!)

> In the end I am not trying to say flat_set is broken or anything. My point
> is really simple. IMO I was using flat_set in a fairly conventional way and
> incurred an unexpected and considerable construction overhead. So much so
> that I had to replace flat_set. I personally find it unfortunate. I have
> quite a few more places where I use flat_set in a similar setting -- a
> lengthy elaborate construction followed by much lengthier deployment. My
> use-case seems fairly straightforward... but I had to replace flat_set. If
> so, then I am questioning now where I might need flat_sets? That feels
> hugely unfortunate that a whole chunk of Boost functionality/code (seemingly
> designed for this very purpose) can't be used. Isn't it a concern? Don't you
> agree?

I think flat_set and similar containers work quite well in cases they
are targeted for. The fact that you started to care about the
container initialization simply means that your case is not the one
flat_set is tailored for. That's fine. Just propose a better
alternative for your case.

[1]: http://www.boost.org/doc/libs/1_63_0/doc/html/boost/container/flat_set.html

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 26 March 2017 at 15:49, Vladimir Batov via Boost <[hidden email]>
wrote:

> That feels quite complex... what is wrong with my (seemingly) simple
> suggestion of sorting on when-needed basis?


Use one <https://github.com/yruslan/flat_map> that does instead.

degski

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list

On 03/27/2017 11:27 AM, degski via Boost wrote:
> On 26 March 2017 at 15:49, Vladimir Batov wrote:
>> That feels quite complex... what is wrong with my (seemingly) simple
>> suggestion of sorting on when-needed basis?
> Use one <https://github.com/yruslan/flat_map> that does instead.

Yes. That's the idea. Thank you for the pointer... At my place the
deployment of Boost or http://github.com/whatever makes quite a
difference. That (still for me) begs the question if the approach taken
in boost::container::flat_set is the most practical one.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2017-03-27 10:54, Andrey Semashev wrote:

>
>>> How does sort_() handle thread safety? Are you proposing to add a
>>> mutex to the container?
>>
>> Yes, I am/was... Is it a problem? I can't see it being a bottle neck
>> as it
>> all will be optimized inside sort_() (no unnecessary locks, etc.).
>
> I'm sorry, but that is a no-go. Too often containers are used with
> external locking, and internal locking would just waste performance,
> especially in such a hot spot as begin(). BTW, I don't believe any
> compiler is able to optimize away unnecessary locking (thank god!)

1. There does not have to be "internal locking would just waste
performance":

void sort_()
{
     if (sorted) return;
     lock
     if (sorted) return;
     ...do actual sorting
}

2. Still, I am not sure how we managed to this point -- std::vector or
other containers do not provide any MT guarantees. I would not expect
sorted_vector to be that different.

> I think flat_set and similar containers work quite well in cases they
> are targeted for.

I am not sure I see my use-case as so special. Quite the opposite.
That's why from the set-go I was wondering if flat_set behavior was
correct -- it was not performing in my (seemingly standard) use-case.

> The fact that you started to care about the
> container initialization simply means that your case is not the one
> flat_set is tailored for. That's fine.

Again, I honestly do not understand what so special about my use-case.
My container's main usage is traversal and search. That's why I deployed
flat_set in the first place. I only "started to care about the container
initialization" after it popped up in the profiler list... Very much
unexpectedly so. :-)

> Just propose a better
> alternative for your case.

Indeed, the more I think of it the less I am comfortable with
"flat_set". The intention was probably good to mimic a set. However,
vector's ears still keep sticking out. So, it seems that sorted_vector
might be a more honest and not misleading name... And given the benefits
it offers over a set, it seems quite needed. However, from the
experience "proposing" is such a daunting one-sided battle. Everyone
knows how it should be done... but it's you implementing it... and your
implementation always seems done incorrectly. :-) I am not sure I am up
for such a battle any more.



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
On 2017-03-27 07:40, Vladimir Batov via Boost wrote:

> On 2017-03-27 10:54, Andrey Semashev wrote:
>>
>>>> How does sort_() handle thread safety? Are you proposing to add a
>>>> mutex to the container?
>>>
>>> Yes, I am/was... Is it a problem? I can't see it being a bottle neck
>>> as it
>>> all will be optimized inside sort_() (no unnecessary locks, etc.).
>>
>> I'm sorry, but that is a no-go. Too often containers are used with
>> external locking, and internal locking would just waste performance,
>> especially in such a hot spot as begin(). BTW, I don't believe any
>> compiler is able to optimize away unnecessary locking (thank god!)
>
> 1. There does not have to be "internal locking would just waste
> performance":
>
> void sort_()
> {
>     if (sorted) return;
>     lock
>     if (sorted) return;
>     ...do actual sorting
> }

Double Checked Locking is notorious for failing randomly. sorted must at
least be atomic.

One of many articles about it:
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

>> Of course we might not want to use the current insert for this as this
>> would require storing all new elements. Here is my approach to this:
>>
>> flat_set maintains an array and three iterators(set_begin, set_end and
>> vector_end). the range [set_begin,set_end) contains all inserted
>> elements and size() begin(), end() will only return this range. the
>> range [set_end, vector_end) contains all newly inserted elements,
>> however they´are still "staging" and will not show up unless the
>> flat_set is told to.
>>
>> so we would add two new methods
>>
>> staging_insert(single_element)//pushes element to the back of the
>> staging area
>> sort_staging()//calls sort() on the old range. afterwards
>> set_end=staging_end.
>
> That feels quite complex... what is wrong with my (seemingly) simple
> suggestion of sorting on when-needed basis?

I think the solution is much less complex than a solution that requires
locks and atomics. It looks more complex from the outside though, I
agree.
Moreover, we do not come in the situation that we have a change of the
data structure in a const-environment.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
AMDG

On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:

>
> On 03/27/2017 09:20 AM, Andrey Semashev wrote:
>> On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
>>
>>>>    flat_set< int > fs;
>>>>    fs.reserve(5);
>>>>    fs.insert(10);
>>>>    auto it1 = fs.begin();
>>>>    auto it2 = fs.insert(42).first;
>>>>    auto it3 = f2.begin(); // invalidates it1 and it2
>>> Well, I do understand that insertion into a std::vector invalidates the
>>> iterators. I thought you argued (when you said "No, this won't work in
>>> general because...") that "my" approach (of only sorting on when-needed
>>> basis) had different to the flat_set behavior.
>> I did, and the example above shows where your proposed change breaks
>> the previously valid code.
>
> I have to disagree with your "proposed change breaks the previously
> valid code" statement. Indeed, given we know the implementation of
> flat_set, we can say that the above code does work with the current
> flat_set implementation. However, the code is not guaranteed to work
> with the current flat_set.

Invalidating it1 may be fine.  Invalidating
it2 is a definitely not okay.

In Christ,
Steven Watanabe


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list

On 03/28/2017 03:15 AM, Steven Watanabe via Boost wrote:

> AMDG
>
> On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:
>> On 03/27/2017 09:20 AM, Andrey Semashev wrote:
>>> On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
>>>
>>>>>     flat_set< int > fs;
>>>>>     fs.reserve(5);
>>>>>     fs.insert(10);
>>>>>     auto it1 = fs.begin();
>>>>>     auto it2 = fs.insert(42).first;
>>>>>     auto it3 = f2.begin(); // invalidates it1 and it2
>>>> ...
> Invalidating it1 may be fine.  Invalidating
> it2 is a definitely not okay.

Indeed, nits like the above convince me that taking an ordered vector
and calling it a set was probably a good-intentions-driven mistake. That
set-oriented mindset then influenced the set-like api and behavior too
much (IMO) that in turn affected the performance... the original goal of
the flat_set.








_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [flat_set] When to sort?

Boost - Dev mailing list
On 2017-03-27 22:27, Vladimir Batov via Boost wrote:

> On 03/28/2017 03:15 AM, Steven Watanabe via Boost wrote:
>> AMDG
>>
>> On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:
>>> On 03/27/2017 09:20 AM, Andrey Semashev wrote:
>>>> On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
>>>>
>>>>>>     flat_set< int > fs;
>>>>>>     fs.reserve(5);
>>>>>>     fs.insert(10);
>>>>>>     auto it1 = fs.begin();
>>>>>>     auto it2 = fs.insert(42).first;
>>>>>>     auto it3 = f2.begin(); // invalidates it1 and it2
>>>>> ...
>> Invalidating it1 may be fine.  Invalidating
>> it2 is a definitely not okay.
>
> Indeed, nits like the above convince me that taking an ordered vector
> and calling it a set was probably a good-intentions-driven mistake.
> That set-oriented mindset then influenced the set-like api and
> behavior too much (IMO) that in turn affected the performance... the
> original goal of the flat_set.

Could you elaborate? From my point of view the semantics and runtime
characteristics are exactly what I would expect from a flattened binary
tree(okay, personally i could do with a different, more cache friendly
ordering of elements, but my use cases are special). I would find any
other behaviour surprising.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12
Loading...