[Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
This post has NOT been accepted by the mailing list yet.
Hi Boost team,
getting a weird behaviour on XP SP3
Boost 1.54.0 (with stlport)

Interprocess library example with:
comp_doc_anonymous_conditionA.cpp
comp_doc_anonymous_conditionB.cpp

When running some speed tests the round-trip time between two messages is of 15 ms (quite slow for what I need despite this library is amazing), but if I open a multiprocess browser (Chrome or Firefox 4.0, not with IE8) the speed instantly increases to 1.5 ms which starts to be acceptable to me. If I minimize the browser the program slows down again.

Should I try the workaround proposed in http://boost.2283326.n4.nabble.com/Boost-Interprocess-Chronic-performance-on-Win7-td3755619.html by Gav Wood? Not sure if that would change anything.

Any idea why is this happening? Is there any fix or workaround for this?

I opened a ticket (#9008) for this.

Thank you,

Marcello
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
I've noticed the same 10 times acceleration even while I make a skyPE call...

How can this have anything to do with interprocess programming?

Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
I've applied the workaround suggested by Gav Wood and that seem to be able to completely fix the problem.

Still I don't get why we should see such huge speed up when some other applications connecting to internet are open.

Besides, despite the fact the cause of this seems entirely due to the inner working of Windows OS (sigh) I'm still left wondering why it is not possible to *optionally* implement Gav's workaround in the library when BOOST_INTERPROCESS_WINDOWS is defined, considering the increase in performance can be up to almost 1000 times.
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Ion Gaztañaga
El 18/08/2013 1:11, Marcello Pietrobon escribió:

> I've applied the workaround suggested by Gav Wood and that seem to be able to
> completely fix the problem.
>
> Still I don't get why we should see such huge speed up when some other
> applications connecting to internet are open.
>
> Besides, despite the fact the cause of this seems entirely due to the inner
> working of Windows OS (sigh) I'm still left wondering why it is not possible
> to *optionally* implement Gav's workaround in the library when
> BOOST_INTERPROCESS_WINDOWS is defined, considering the increase in
> performance can be up to almost 1000 times.

It's a very strange issue, but we need to definitely fix this issue
applyging a patch similar to Gav's, maybe using Peter Dimov's "yield_k"
(http://www.boost.org/doc/libs/1_54_0/boost/smart_ptr/detail/yield_k.hpp) function.
I've been very busy lately to work in Boost and the little time I had
has been spent on Intrusive and Container.

I'll try to fix the issue in the following weeks, thanks for the ticket
and for testing Gav's patch.

Best,

Ion


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
Great. I will test your fix right away if possible.
Here in this attachment my code, just to have an idea, not that I'm suggesting it for boost.

Regards,

Marcello

interprocess.zip
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
In reply to this post by Ion Gaztañaga
Thank you for the last fix Ion.

I've run some tests on it and it has improved the performance, but not completely.

Clearly this problem is not limited to your interprocess library so I thought to open a different thread discussion for it:
http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-too-slow-td4650929.html

I've done some profiling plus some tests and so it's clear to me that the test program is still slowed down around the ::sleep(1) instruction.

I am personally content with replacing the value 32 with a value above 1000, so the resolution to this is not urgent for me (just to take some pressure of you ;)).

Best regards,

Marcello
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Ion Gaztañaga
El 21/08/2013 7:39, Marcello Pietrobon escribió:

> Thank you for the last fix Ion.
>
> I've run some tests on it and it has improved the performance, but not
> completely.
>
> Clearly this problem is not limited to your interprocess library so I
> thought to open a different thread discussion for it:
> http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-too-slow-td4650929.html
>
> I've done some profiling plus some tests and so it's clear to me that the
> test program is still slowed down around the ::sleep(1) instruction.
>
> I am personally content with replacing the value 32 with a value above 1000,
> so the resolution to this is not urgent for me (just to take some pressure
> of you ;)).

Thanks for the test. It's definitely hard to tell if 1000 will be OK for
everyone, as it might depend on the CPU speed or waiter count (in your
example there is a lot of waiting between the same two processes, which
is not same use case as hundreds of threads waiting for a single resource).

There is a *very experimental* support for native synchronization
primitives on windows if you comment the line:

#define BOOST_INTERPROCESS_FORCE_GENERIC_EMULATION

on boost/interprocess/detail/workaround.hpp

It tries to create Windows native named semaphores on the fly with a
unique name and implements Alexander Terekhov's 8a algorithm to
implement a condition variable. I don't know if it could be faster on
your application, but it should use less CPU as it does not use busy
waiting.

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Gav Wood
The default 'blind' prior is a simple exponential expectation whereby
we assume for any given duration of waiting 't', that the expected
completion time is 2t; i.e. we expect to wait as long as we have
already been waiting. As such, the optimum time to start with the
'sleep(1)' strategy (which from my tests sleeps for a full 20ms
timeslice) is after 20ms, (only after which point the prior leads us
to assert that the completion will probably take at least another
20ms).

In my patch, I found the 20ms 'optimum' value to be considerably
higher than 1000, and that was on hardware circa 2010.

Gav.

On 21 August 2013 08:07, Ion Gaztañaga <[hidden email]> wrote:

> El 21/08/2013 7:39, Marcello Pietrobon escribió:
>
>> Thank you for the last fix Ion.
>>
>> I've run some tests on it and it has improved the performance, but not
>> completely.
>>
>> Clearly this problem is not limited to your interprocess library so I
>> thought to open a different thread discussion for it:
>>
>> http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-too-slow-td4650929.html
>>
>> I've done some profiling plus some tests and so it's clear to me that the
>> test program is still slowed down around the ::sleep(1) instruction.
>>
>> I am personally content with replacing the value 32 with a value above
>> 1000,
>> so the resolution to this is not urgent for me (just to take some pressure
>> of you ;)).
>
>
> Thanks for the test. It's definitely hard to tell if 1000 will be OK for
> everyone, as it might depend on the CPU speed or waiter count (in your
> example there is a lot of waiting between the same two processes, which is
> not same use case as hundreds of threads waiting for a single resource).
>
> There is a *very experimental* support for native synchronization primitives
> on windows if you comment the line:
>
> #define BOOST_INTERPROCESS_FORCE_GENERIC_EMULATION
>
> on boost/interprocess/detail/workaround.hpp
>
> It tries to create Windows native named semaphores on the fly with a unique
> name and implements Alexander Terekhov's 8a algorithm to implement a
> condition variable. I don't know if it could be faster on your application,
> but it should use less CPU as it does not use busy waiting.
>
> Best,
>
> Ion
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Peter Dimov-2
Gav Wood wrote:
> The default 'blind' prior is a simple exponential expectation whereby
> we assume for any given duration of waiting 't', that the expected
> completion time is 2t; i.e. we expect to wait as long as we have
> already been waiting. As such, the optimum time to start with the
> 'sleep(1)' strategy (which from my tests sleeps for a full 20ms
> timeslice) is after 20ms, (only after which point the prior leads us
> to assert that the completion will probably take at least another
> 20ms).

For a 20ms timeslice, this means that one should Sleep(1) after 10ms of
Sleep(0).

But the timeslice is not necessarily 20ms. A call to timeBeginPeriod(1) may
shorten it.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Ion Gaztañaga
El 21/08/2013 17:43, Peter Dimov escribió:

> But the timeslice is not necessarily 20ms. A call to timeBeginPeriod(1)
> may shorten it.

timeBeginPeriod seems a bit scary as it affects the general windows
scheduler. One option could be to call GetTickCount until it changes its
value twice (as the resolution of this time is the resolution of the
system timer). Once it changes (after 20-40ms looping, on average that
would me 30ms), then Sleep(1) is called. And for uniprocessor systems we
should avoid doing any loop and call only Sleep(0). But that would be a
really hard to implement spinlock ;-)

Another option is to obtain the a high resolution timestamp each loop
and just loop, say, for 100ms before going to Sleep(1).

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Ion Gaztañaga
In reply to this post by Marcello Pietrobon
El 17/08/2013 8:24, Marcello Pietrobon escribió:
> I've noticed the same 10 times acceleration even while I make a skyPE call...
>
> How can this have anything to do with interprocess programming?

After yield_k didn't offer good enough results, I decided to wrap the
wait logic in a class instead of a function (called, spin_wait). This
class would container the "k_" integer of yield_k and it would lazily
obtain the value of the system tick, spining and yielding until that
period has elapsed (using a high resolution counter or similar). I'm
still finishing this class for Windows and then I need to write it for
POSIX systems (and since MacOS does not support nanosleep, I maybe will
need to do something special for this platform).

However, in my first tests, I found that several applications change the
default Windows tick period from 15,6 ms to 1ms (like just after
launching Google Chrome). That's the reason why current Interprocess
spinlocks run better when you start those applications: Sleep(1) was
really sleeping for 1ms instead of 15ms (these values might change
between different computers, I guess).

In my first tests in my system (2,8Ghz Core i7), when the system tick is
1ms, an interprocess mutex needs 2700 iterations (32 nops/pauses +
Sleep(0)) to wait for a tick. When the system tick is 15,6ms, it needs
41860 iterations (32 nops/pauses, + Sleep(0))).

This means that no fixed value should be used to mark the yield/sleep
limit, as it highly depends on the processor core and the system tick
(that can be changed at any moment). I think N x (system tick time)
limit could be a good guess.

I don't know which N value is optimal to minimize both CPU usage and
context switch overhead. We'd need to do some tests for that. In any
case, I think this new approach will improve a lot current Interprocess
horrible latencies. I'll ping the list when I commit a portable spin
wait logic in a few days.

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
Great job again Ion.

I've checked out your last revision, so curious to try it in the little time available I have.
Unfortunately it links only if you have only one object file including wait.hpp, as in the test examples, not if you have many.
So I decided to roll back the changes on my hard drive and yield() and wait() for a fix :)

Here the error message:
MyTest2.obj : error LNK2005: "private: static unsigned int boost::interprocess::ipcdetail::num_core_holder<0>::num_cores" (?num_cores@?$num_core_hol
der@$0A@@ipcdetail@interprocess@boost@@0IA) already defined in MyTest1.obj ..\..\..\..\bin\mylibrary_vc10_d.dll : fatal error LNK1169: one or more multiply defined symbols found
 
The problem is obviously at line 71 of wait.hpp :
unsigned int num_core_holder<0>::num_cores = ipcdetail::get_num_cores();

Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Marcello Pietrobon
Very good.

I've did some testing using the latest version (in the repository), with the same testing code as reported in http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-too-slow-td4650929.html and the performance seems excellent.
Again these times are representative of few trials.

jMax = 0 : time = 00:00:00.281250
jMax = 10 : time = 00:00:00.426250
jMax = 100 : time = 00:00:01.631875
 
Thanks!!
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Ion Gaztañaga
El 07/09/2013 19:52, Marcello Pietrobon escribió:

> Very good.
>
> I've did some testing using the latest version (in the repository), with the
> same testing code as reported in
> http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-too-slow-td4650929.html
> and the performance seems excellent.
> Again these times are representative of few trials.
>
> jMax = 0 : time = 00:00:00.281250
> jMax = 10 : time = 00:00:00.426250
> jMax = 100 : time = 00:00:01.631875

Nice to hear it. This will also help Mac Os users as this platform lacks
process-shared mutexes/conditions and spinlocks are used.

Thanks for the report and testing. Thanks also to Gav Wood for this
original report and tests.

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Gottlob Frege
In reply to this post by Ion Gaztañaga
I haven't looked at the code, but I see mention of Sleep(0) here.  Does
everyone realize the special behavior of Sleep(0) ?

It only gives up a timeslice to threads of >= priority. Starving lower
priority threads (if you were to spin with only that).

Not sure it is a problem inthis case, but wanted to mention it, as it is
often overlooked.

Tony


Sent from my portable Analytical Engine

------------------------------
*From:* "Ion Gaztañaga" <[hidden email]>
*To:* "[hidden email]" <[hidden email]>
*Sent:* 21 August, 2013 4:42 PM
*Subject:* Re: [boost] [Boost.Interprocess] conditions variables get 10
times faster when opening a multiprocess browser

El 21/08/2013 17:43, Peter Dimov escribió:

> But the timeslice is not necessarily 20ms. A call to timeBeginPeriod(1)
> may shorten it.

timeBeginPeriod seems a bit scary as it affects the general windows
scheduler. One option could be to call GetTickCount until it changes its
value twice (as the resolution of this time is the resolution of the
system timer). Once it changes (after 20-40ms looping, on average that
would me 30ms), then Sleep(1) is called. And for uniprocessor systems we
should avoid doing any loop and call only Sleep(0). But that would be a
really hard to implement spinlock ;-)

Another option is to obtain the a high resolution timestamp each loop
and just loop, say, for 100ms before going to Sleep(1).

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser

Ion Gaztañaga
El 08/09/2013 7:39, Gottlob Frege escribió:
> I haven't looked at the code, but I see mention of Sleep(0) here.  Does
> everyone realize the special behavior of Sleep(0) ?
>
> It only gives up a timeslice to threads of >= priority. Starving lower
> priority threads (if you were to spin with only that).

Yes we take in care this Sleep(0) behaviour. In fact Sleep was changed
starting with Windows Server 2003 and now it can relinquish the
remainder of its time slice to any other thread that is ready to run.
The code uses a combination of SwitchToThread + Sleep(0) plus Sleep(1)
to avoid starvation.

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost