Hello Boost team,
As exposed by Gav Wood in
http://boost.2283326.n4.nabble.com/Boost-Interprocess-Chronic-performance-on-Win7-td3755619.html Windows doesn't handle well the contest switching between threads.
Basically if a thread is given a time to sleep for 1ms then the OS puts that thread on sleep for its entire timeslice (~20ms), no matter if the concurrent threads need a much shorter time to complete their job.
The effect of this choice is that if you need to have two threads talking with each other through an emulated synchronization object, they take well 2 seconds to do just 100 back and forth.
At least it seems to me that this problem arises only when
emulating a synchronization object, which I understand is necessary, for example, in the interprocess library.
It's not a problem of easy solution.
There is an ongoing discussion in the interprocess library on how to fix this (see for example
http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-get-10-times-faster-when-opening-a-multiprocess-browser-td4650763.html ) but clearly this is not a problem limited to it.
The workaround
http://www.boost.org/doc/libs/1_54_0/boost/smart_ptr/detail/yield_k.hpp improves the situation, yet only a bit.
In order to isolate the problem I've run some tests using the interprocess library, with the example comp_doc_anonymous_conditionA.cpp (and B).
This test is slowed down by this time of a factor ranging from 2 to 10, 100 or even 1000 or more times depending on how soon that infamous Sleep(1) is triggered.
The bottom line is that in yield_k.cpp we need to change the constant in the code else if( k < 32 ){
from 32 to
1000 at least.
I call this constant kMax:
else if( k < kMax){
Here some test results, running XPsp3 on Xeon E5450 - dual socket 32bit CPU quad cores. 2GHz core speed.
For test I changed comp_doc_anonymous_conditionA.cpp :
const int NumMsg = 100000;
that is 100000 iterations,
while in comp_doc_anonymous_conditionB.cpp I've added the line
for ( int k=0; k < 100; ++ k ) s += "1";
just to emulate a job done by the thread while owning the mutex.
That is:
boost::posix_time::ptime now = microsec_clock::universal_time();
std::string s;
//Print messages until the other process marks the end
bool end_loop = false;
do{
scoped_lock<interprocess_mutex> lock(data->mutex);
if(!data->message_in){
data->cond_empty.wait(lock);
}
// just a code doing some job
for ( int j=0; j < jMax; ++ k ) s += "1";
if(std::strcmp(data->items, "last message") == 0){
end_loop = true;
}
else{
//Print the message
// std::cout << data->items << std::endl;
//Notify the other process that the buffer is empty
data->message_in = false;
data->cond_full.notify_one();
}
}
while(!end_loop);
The results are (these numbers are representative of many tests I've run) :
kMax = 32, jMax = 0 : time = 00:00:01.125000
kMax = 50, jMax = 0 : time = 00:00:00.403750
kMax = 100, jMax = 0 : time = 00:00:00.2803750
kMax = 65535, jMax = 0 : time = 00:00:00.203125
kMax ~= 64 is good
kMax = 32, jMax = 10 : time = 00:00:32.843750
kMax = 100, jMax = 10 : time = 00:00:02.750000
kMax = 1000, jMax = 10 : time = 00:00:00.859375
kMax = 65535, jMax = 10 : time = 00:00:00.328125
kMax ~= 1000 is good
kMax = 32, jMax = 100 : time = 00:04:01.953125
kMax = 50, jMax = 100 : time = 00:00:04.296875
kMax = 1000, jMax = 100 : time = 00:00:02.062500
kMax = 65535, jMax = 100 : time = 00:00:01.593750
kMax ~= 1000 is good
If the job takes longer we need to increase kMax even more,
but it seems that a value of 65535 cuts the problem and makes sure that a locked thread doesn't take most of the cpu time for more than one timeslice.