[Performance of circular_buffer vs Queue][Multithreading may be broken]

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
Dear Boost developer,
I found the circular buffer to be slower than std::queue for
multithreading except in the case of a single thread (where it is
notably faster).
I am using resize() to have the expected size filled as in vector.

Is this behavior expected or documented anywhere?
What kind of speedup is to be expected against the std::queue
implementation?
If there is none, then what is the purpose of circular buffer?

I do also have broken code on a multithreaded application that worked
fine before. I do pass the circular buffer as reference via functions
from file scope, which in combination with the threads (initially
starting elsewhere) somehow broke the code.

Do you do regular benchmarks against other implementations? I dont see
linked graphs on your homepage.

Regards,
Jan

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
On Wed, 1 Apr 2020 at 05:27, Jan Hafer via Boost <[hidden email]>
wrote:

> Dear Boost developer,
> I found the circular buffer to be slower than std::queue for
>

These are different containers.

multithreading except in the case of a single thread (where it is
> notably faster).
>

std::queue is not synchronized.

I am using resize() to have the expected size filled as in vector.
>
> Is this behavior expected or documented anywhere?
> What kind of speedup is to be expected against the std::queue
> implementation?
> If there is none, then what is the purpose of circular buffer?
>

You could read the relevant docs. I strongly doubt that this boost
container is slower than anything in the STL when compared like for like.

I do also have broken code on a multithreaded application that worked

> fine before. I do pass the circular buffer as reference via functions
> from file scope, which in combination with the threads (initially
> starting elsewhere) somehow broke the code.
>
> Do you do regular benchmarks against other implementations? I dont see
> linked graphs on your homepage.
>
> Regards,
> Jan
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

degski
--
@systemdeg
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Jan Hafer wrote:
> I do also have broken code on a multithreaded application that worked fine
> before. I do pass the circular buffer as reference via functions from file
> scope, which in combination with the threads (initially starting
> elsewhere) somehow broke the code.

Does this mean that you're using the same circular_buffer from more than one
thread without synchronization?


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Thanks for the fast reply.

On 01.04.20 16:08, degski wrote:
> On Wed, 1 Apr 2020 at 05:27, Jan Hafer via Boost <[hidden email]>
> wrote:
>
>> Dear Boost developer,
>> I found the circular buffer to be slower than std::queue for
>>
>
> These are different containers.

Yes, however I needed a queue and was interested in the performance
characteristics.
Apparently std::queue is slower on single-thread applications,
but faster for multiple threads. I could reproduce the results on my own
circular buffer implementation.

>
> multithreading except in the case of a single thread (where it is
>> notably faster).
>>
>
> std::queue is not synchronized.

Yes, I do use 1 buffer/queue per thread.

>
> I am using resize() to have the expected size filled as in vector.
>>
>> Is this behavior expected or documented anywhere?
>> What kind of speedup is to be expected against the std::queue
>> implementation?
>> If there is none, then what is the purpose of circular buffer?
>>
>
> You could read the relevant docs. I strongly doubt that this boost
> container is slower than anything in the STL when compared like for like.

True, could reproduce.

>
> I do also have broken code on a multithreaded application that worked
>> fine before. I do pass the circular buffer as reference via functions
>> from file scope, which in combination with the threads (initially
>> starting elsewhere) somehow broke the code.
>>
>> Do you do regular benchmarks against other implementations? I dont see
>> linked graphs on your homepage.

I would like to see the benchmark results against std-implementations
linked on the main page.
If possible, also as simple comparison per thread count.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
> Gesendet: Donnerstag, 02. April 2020 um 18:43 Uhr
> Von: "Jan Hafer via Boost" <[hidden email]>
> Thanks for the fast reply.
>
> >>
> >> Is this behavior expected or documented anywhere?
> >> What kind of speedup is to be expected against the std::queue
> >> implementation?
> >> If there is none, then what is the purpose of circular buffer?
> >>
> >
> > You could read the relevant docs. I strongly doubt that this boost
> > container is slower than anything in the STL when compared like for like.
>
> True, could reproduce.
>
> >
> > I do also have broken code on a multithreaded application that worked
> >> fine before. I do pass the circular buffer as reference via functions
> >> from file scope, which in combination with the threads (initially
> >> starting elsewhere) somehow broke the code.
> >>
> >> Do you do regular benchmarks against other implementations? I dont see
> >> linked graphs on your homepage.
>
> I would like to see the benchmark results against std-implementations
> linked on the main page.
> If possible, also as simple comparison per thread count.

Could you give a full example /repro of what exactly you are measuring?
Saying "container X is faster than Y" is not very helpful without showing what
operations you are measuring and what the workload is.

You e.g. could create a small github repository, or a gist,
or put the your benchmark here: http://quick-bench.com/

Best

Mike

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, 2 Apr 2020 at 17:43, Jan Hafer via Boost <[hidden email]> wrote:


> Yes, I do use 1 buffer/queue per thread.

So you're saying that circular_buffer is slower on a given thread when
other threads are accessing their own circular_buffer in parallel?
That sounds unlikely to be circular buffer's fault.

The first thing to check is that the memory placement of your circular
buffer does not result in false sharing with other threads.
More generally you could be stalling due to any inter-connection, such
as crossing NUMA domains or talking to hardware.
Otherwise you could be saturating the execution ports of your core and
thus not get the expected speed-up from hyper-threading, but in any
case you'd be the best throughput overall.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list


On 02.04.20 20:59, Mathias Gaunard wrote:
> On Thu, 2 Apr 2020 at 17:43, Jan Hafer via Boost <[hidden email]> wrote:
>
>
>> Yes, I do use 1 buffer/queue per thread.
>
> So you're saying that circular_buffer is slower on a given thread when
> other threads are accessing their own circular_buffer in parallel?
> That sounds unlikely to be circular buffer's fault.

Yes and I dont know quite the reason for it.
My Threads know their id to access a file-global data structure
containing their queue/circular buffer. They start another after in a
thread-safe way and exit on emptying the queue/circular buffer.

>
> The first thing to check is that the memory placement of your circular
> buffer does not result in false sharing with other threads.
> More generally you could be stalling due to any inter-connection, such
> as crossing NUMA domains or talking to hardware.
> Otherwise you could be saturating the execution ports of your core and
> thus not get the expected speed-up from hyper-threading, but in any
> case you'd be the best throughput overall.
>

I know how to obtain the source code of boost, but the compiler vendors
sadly provide no direct internet-searchable links to the std source code.
Thanks for your advice. I may dig further with perf to obtain cache-miss
statistics.

My goal was just to inform you of this characteristics, since this is
not documented.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
Jan Hafer wrote:

> I know how to obtain the source code of boost, but the compiler vendors
> sadly provide no direct internet-searchable links to the std source code.

https://github.com/gcc-mirror/gcc/tree/master/libstdc%2B%2B-v3
https://github.com/llvm-mirror/libcxx/
https://github.com/microsoft/stl


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 3/04/2020 08:41, Jan Hafer wrote:
> On 02.04.20 20:59, Mathias Gaunard wrote:
>> So you're saying that circular_buffer is slower on a given thread when
>> other threads are accessing their own circular_buffer in parallel?
>> That sounds unlikely to be circular buffer's fault.
>
> Yes and I dont know quite the reason for it.
> My Threads know their id to access a file-global data structure
> containing their queue/circular buffer. They start another after in a
> thread-safe way and exit on emptying the queue/circular buffer.

That sounds like you're allocating a single array of circular_buffers
and then accessing them from different threads.

That's basically the worst possible thing to do; as Mathias was saying,
that will end up sharing cache lines between different cores and your
performance will tank.

At minimum, you should embed the circular_buffer into another struct
that has sizeof() >= std::hardware_destructive_interference_size, and
make an array of that.

But better still, embed the circular_buffer into your processing classes
and don't have arrays of them at all.

(If you're pre-C++17 and don't have
std::hardware_destructive_interference_size, then using 64 works for
most modern platforms.)


Ideally, the circular_buffer implementation itself should also separate
all internal producer-thread members and consumer-thread members by
std::hardware_destructive_interference_size and try very hard to not
cross over.  (Here the main thing that matters is write accesses.)

If you want to try using a more modern circular buffer that gets this
correct, have a look at Boost.Lockfree's spsc_queue.

https://www.boost.org/doc/libs/1_72_0/doc/html/boost/lockfree/spsc_queue.html

(While you're there, there's also an MPMC queue as well.  Note that a
lockfree queue will tend to be slower than std::queue in uncontended
benchmarks, but the avoidance of locks can be superior in highly
contended or other specialised scenarios.)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
I am following this discussion thread with no particular interest, but I can
imagine that there are others who will find the discussion, theorizing and
advice rather valuable.  Can anyone add a link to it to the circular buffer
documentation?

Paul A. Bristow


> -----Original Message-----
> From: Boost <[hidden email]> On Behalf Of Gavin Lambert via
> Boost
> Sent: 3 April 2020 00:16
> To: [hidden email]
> Cc: Gavin Lambert <[hidden email]>
> Subject: Re: [boost] [Performance of circular_buffer vs Queue][Multithreading
may

> be broken]
>
> On 3/04/2020 08:41, Jan Hafer wrote:
> > On 02.04.20 20:59, Mathias Gaunard wrote:
> >> So you're saying that circular_buffer is slower on a given thread
> >> when other threads are accessing their own circular_buffer in parallel?
> >> That sounds unlikely to be circular buffer's fault.
> >
> > Yes and I dont know quite the reason for it.
> > My Threads know their id to access a file-global data structure
> > containing their queue/circular buffer. They start another after in a
> > thread-safe way and exit on emptying the queue/circular buffer.
>
> That sounds like you're allocating a single array of circular_buffers and then
> accessing them from different threads.
>
> That's basically the worst possible thing to do; as Mathias was saying, that
will end
> up sharing cache lines between different cores and your performance will tank.
>
> At minimum, you should embed the circular_buffer into another struct that has
> sizeof() >= std::hardware_destructive_interference_size, and make an array of
that.
>
> But better still, embed the circular_buffer into your processing classes and
don't
> have arrays of them at all.
>
> (If you're pre-C++17 and don't have
> std::hardware_destructive_interference_size, then using 64 works for most
> modern platforms.)
>
>
> Ideally, the circular_buffer implementation itself should also separate all
internal
> producer-thread members and consumer-thread members by
> std::hardware_destructive_interference_size and try very hard to not cross
over.
> (Here the main thing that matters is write accesses.)
>
> If you want to try using a more modern circular buffer that gets this correct,
have a
> look at Boost.Lockfree's spsc_queue.
>
> https://www.boost.org/doc/libs/1_72_0/doc/html/boost/lockfree/spsc_queue.ht
> ml
>
> (While you're there, there's also an MPMC queue as well.  Note that a lockfree
> queue will tend to be slower than std::queue in uncontended benchmarks, but
the
> avoidance of locks can be superior in highly contended or other specialised
> scenarios.)
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Performance of circular_buffer vs Queue][Multithreading may be broken]

Boost - Dev mailing list
On Fri, 3 Apr 2020 at 02:35, Paul A Bristow via Boost <[hidden email]>
wrote:

> I am following this discussion thread with no particular interest, but I
> can
> imagine that there are others who will find the discussion, theorizing and
> advice rather valuable.  Can anyone add a link to it to the circular buffer
> documentation?
>

I assume it's this cb:
https://www.boost.org/doc/libs/1_72_0/doc/html/circular_buffer.html

degski
--
@systemdeg
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost