Performance optimization in Boost using std::vector<>

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance optimization in Boost using std::vector<>

saloo
Hello everybody,

I have a question related to performance optimization using Boost. I found this link http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html and trying to figure out which curve (on the graph in the link) represents the communication of std::vector<int> and std::vector<double>? Is communication using std::vector<int> and std::vector<double> optimized (is_mpi_datatype) or not?

So I use "boost_mpi" and "boost_serialization" libraries. I include the header "#include <boost/serialization/vector.hpp>" in my code. Then I send directly std::vector<int> and std::vector<double> using "world.send(...) " and world.recv(...)" calls. I fill the vector with some values (for example I fill ten values) and I get the same ten values on other side of processor boundary. This thing works but I want to improve communication performance.
I found out in this link http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html under section "User-defined data types" that "Fixed data types can be optimized for transmission using the is_mpi_datatype type trait. ". Also I studied the information on http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performance_optimizations. Also this link http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#arrays shows that std::vector<> are optimized for serialization.
 I am now confused that sending std::vector<> like this is good for performance optimization or not? What other better methods are available? Is something like this http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content a good option?
Best Regards,
Salman Arshad
Reply | Threaded
Open this post in threaded view
|

Re: Performance optimization in Boost using std::vector<>

saloo
My code looks similar to example below but i send really big vectors.

#include <boost/mpi.hpp>
#include <iostream>
#include <boost/serialization/vector.hpp>
namespace mpi = boost::mpi;

int main()
{
  mpi::environment env;
  mpi::communicator world;
std::vector<int> my_vector;

if (world.rank() == 0) {
    my_vector.push_back(17);
    my_vector.push_back(38);
    world.send(1, 0, my_vector);
} else {
    world.recv(0, 0, my_vector);
}
  return 0;
}
Reply | Threaded
Open this post in threaded view
|

Re: Performance optimization in Boost using std::vector<>

Gonzalo BG
In reply to this post by saloo
There is a known performance problem with serializing a std::vector  over MPI. 
Basically, this prevents you from ever reaching the performance of C.

The problem is on the receive side. When you receive a vector, if you don't know the size, 
the receive side has to:
- get the number of elements of the vector
- resize the vector (which initializes elements)
- receive the elements in the vector data (reinitialize the elements)

The C version of the idiom:
- gets the number of elements
- reserves (as opposed to resize) the memory for the elements
- receive the element in the vector (initialize elements once).

This might make a small or a large performance difference, profile! However, if you
decide to use std::vector as API, you basically cannot change this later, since
even if you where to use the C idiom, at some point you have to copy
into a std::vector.

A more C++ "alternative" to the C idiom that offers the same performance would be
to use a std::unique_ptr<T[]> + a size.

If you can have a custom vector type, consider adding an 
"unsafe_change_size(std::size_t new_size)" where
"assert(new_size < capacity)" member function and a custom allocator that doesn't
default construct elements. Rust Vec<T> type has it (unsafe get_mut_len), and it 
proves useful into providing a zero const abstraction around a C array that also
is dynamically resizable.

Would I do it if I need a std::vector as abstraction? 
No, I would live with the choice and never try to get as fast as C. Reserve memory 
in your receive buffers at the beginning of the program and keep them around (reuse 
them) to prevent memory allocation during send/receive operations. 


On Wednesday, February 11, 2015 at 3:13:52 PM UTC+1, saloo wrote:
Hello everybody,

I have a question related to performance optimization using Boost. I found
this link
<a href="http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_41_0%2Fdoc%2Fhtml%2Fmpi%2Fperformance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEpTY62UTBD9JT5KjJBdEYmWQfARA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_41_0%2Fdoc%2Fhtml%2Fmpi%2Fperformance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEpTY62UTBD9JT5KjJBdEYmWQfARA';return true;">http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html
<<a href="http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_41_0%2Fdoc%2Fhtml%2Fmpi%2Fperformance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEpTY62UTBD9JT5KjJBdEYmWQfARA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_41_0%2Fdoc%2Fhtml%2Fmpi%2Fperformance.html\46sa\75D\46sntz\0751\46usg\75AFQjCNEpTY62UTBD9JT5KjJBdEYmWQfARA';return true;">http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html>   and
trying to figure out which curve (on the graph in the link) represents the
communication of std::vector<int> and std::vector<double>? Is communication
using std::vector<int> and std::vector<double> optimized (is_mpi_datatype)
or not?

So I use "boost_mpi" and "boost_serialization" libraries. I include the
header "#include <boost/serialization/vector.hpp>" in my code. Then I send
directly std::vector<int> and std::vector<double> using "world.send(...) "
and world.recv(...)" calls. I fill the vector with some values (for example
I fill ten values) and I get the same ten values on other side of processor
boundary. This thing works but I want to improve communication performance.
I found out in this link
<a href="http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_57_0%2Fdoc%2Fhtml%2Fmpi%2Ftutorial.html\46sa\75D\46sntz\0751\46usg\75AFQjCNFCen-1UrNueztpMkiSzzAEeDjHiw';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_57_0%2Fdoc%2Fhtml%2Fmpi%2Ftutorial.html\46sa\75D\46sntz\0751\46usg\75AFQjCNFCen-1UrNueztpMkiSzzAEeDjHiw';return true;">http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html under
section "User-defined data types" that "Fixed data types can be optimized
for transmission using the is_mpi_datatype type trait. ". Also I studied the
information on
<a href="http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performance_optimizations" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_57_0%2Fdoc%2Fhtml%2Fmpi%2Ftutorial.html%23mpi.performance_optimizations\46sa\75D\46sntz\0751\46usg\75AFQjCNHy5pd8XTS2vQCkpovemZl4SihbzA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_57_0%2Fdoc%2Fhtml%2Fmpi%2Ftutorial.html%23mpi.performance_optimizations\46sa\75D\46sntz\0751\46usg\75AFQjCNHy5pd8XTS2vQCkpovemZl4SihbzA';return true;">http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performance_optimizations.
Also this link
<a href="http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#arrays" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_46_1%2Flibs%2Fserialization%2Fdoc%2Fwrappers.html%23arrays\46sa\75D\46sntz\0751\46usg\75AFQjCNF657nn-dV6ZB8uM_s5jDmL8SfDEQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_46_1%2Flibs%2Fserialization%2Fdoc%2Fwrappers.html%23arrays\46sa\75D\46sntz\0751\46usg\75AFQjCNF657nn-dV6ZB8uM_s5jDmL8SfDEQ';return true;">http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#arrays
shows that std::vector<> are optimized for serialization.
 I am now confused that sending std::vector<> like this is good for
performance optimization or not? What other better methods are available? Is
something like this
<a href="http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_57_0%2Fdoc%2Fhtml%2Fmpi%2Ftutorial.html%23mpi.skeleton_and_content\46sa\75D\46sntz\0751\46usg\75AFQjCNHJQ3xW_L5-ACVyv-Ps5Cqc5vy1Ew';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.boost.org%2Fdoc%2Flibs%2F1_57_0%2Fdoc%2Fhtml%2Fmpi%2Ftutorial.html%23mpi.skeleton_and_content\46sa\75D\46sntz\0751\46usg\75AFQjCNHJQ3xW_L5-ACVyv-Ps5Cqc5vy1Ew';return true;">http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content
a good option?
Best Regards,
Salman Arshad



--
View this message in context: <a href="http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-std-vector-tp4672196.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fboost.2283326.n4.nabble.com%2FPerformance-optimization-in-Boost-using-std-vector-tp4672196.html\46sa\75D\46sntz\0751\46usg\75AFQjCNFZjlw3lg_FWJupEqhp9Gc0yC-kAg';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fboost.2283326.n4.nabble.com%2FPerformance-optimization-in-Boost-using-std-vector-tp4672196.html\46sa\75D\46sntz\0751\46usg\75AFQjCNFZjlw3lg_FWJupEqhp9Gc0yC-kAg';return true;">http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-std-vector-tp4672196.html
Sent from the Boost - Users mailing list archive at Nabble.com.
_______________________________________________
Boost-users mailing list
<a href="javascript:" target="_blank" gdf-obfuscated-mailto="QFHU7zGWPKIJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">Boost...@...
<a href="http://lists.boost.org/mailman/listinfo.cgi/boost-users" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Flists.boost.org%2Fmailman%2Flistinfo.cgi%2Fboost-users\46sa\75D\46sntz\0751\46usg\75AFQjCNFxRgYuj2NfW2BGBDmCm0-lTRmqlQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Flists.boost.org%2Fmailman%2Flistinfo.cgi%2Fboost-users\46sa\75D\46sntz\0751\46usg\75AFQjCNFxRgYuj2NfW2BGBDmCm0-lTRmqlQ';return true;">http://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance optimization in Boost using std::vector<>

Adam Romanek
Hi,

not sure if the OP needs std::vector but... I'd recommend
boost::container::vector which has a dedicated constructor [1] and
resize() [2] method tagged with default_init_t argument, both of which
default initialize the values in the vector. For primitives it basically
means it leaves them uninitialized, hence there's no overhead when the
vector is to be filled with real data soon.

WBR,
Adam Romanek

[1]
http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/vector.html#idp42432560-bb
[2]
http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/vector.html#idp42268352-bb

On 12.02.2015 09:42, Gonzalo BG wrote:

> There is a known performance problem with serializing a std::vector
>   over MPI.
> Basically, this prevents you from ever reaching the performance of C.
>
> The problem is on the receive side. When you receive a vector, if you
> don't know the size,
> the receive side has to:
> - get the number of elements of the vector
> - resize the vector (which initializes elements)
> - receive the elements in the vector data (reinitialize the elements)
>
> The C version of the idiom:
> - gets the number of elements
> - reserves (as opposed to resize) the memory for the elements
> - receive the element in the vector (initialize elements once).
>
> This might make a small or a large performance difference, profile!
> However, if you
> decide to use std::vector as API, you basically cannot change this
> later, since
> even if you where to use the C idiom, at some point you have to copy
> into a std::vector.
>
> A more C++ "alternative" to the C idiom that offers the same performance
> would be
> to use a std::unique_ptr<T[]> + a size.
>
> If you can have a custom vector type, consider adding an
> "unsafe_change_size(std::size_t new_size)" where
> "assert(new_size < capacity)" member function and a custom allocator
> that doesn't
> default construct elements. Rust Vec<T> type has it (unsafe
> get_mut_len), and it
> proves useful into providing a zero const abstraction around a C array
> that also
> is dynamically resizable.
>
> Would I do it if I need a std::vector as abstraction?
> No, I would live with the choice and never try to get as fast as C.
> Reserve memory
> in your receive buffers at the beginning of the program and keep them
> around (reuse
> them) to prevent memory allocation during send/receive operations.
>
>
> On Wednesday, February 11, 2015 at 3:13:52 PM UTC+1, saloo wrote:
>
>     Hello everybody,
>
>     I have a question related to performance optimization using Boost. I
>     found
>     this link
>     http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html
>     <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html>
>     <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html
>     <http://www.boost.org/doc/libs/1_41_0/doc/html/mpi/performance.html>>
>     and
>     trying to figure out which curve (on the graph in the link)
>     represents the
>     communication of std::vector<int> and std::vector<double>? Is
>     communication
>     using std::vector<int> and std::vector<double> optimized
>     (is_mpi_datatype)
>     or not?
>
>     So I use "boost_mpi" and "boost_serialization" libraries. I include the
>     header "#include <boost/serialization/vector.hpp>" in my code. Then
>     I send
>     directly std::vector<int> and std::vector<double> using
>     "world.send(...) "
>     and world.recv(...)" calls. I fill the vector with some values (for
>     example
>     I fill ten values) and I get the same ten values on other side of
>     processor
>     boundary. This thing works but I want to improve communication
>     performance.
>     I found out in this link
>     http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html
>     <http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html> under
>     section "User-defined data types" that "Fixed data types can be
>     optimized
>     for transmission using the is_mpi_datatype type trait. ". Also I
>     studied the
>     information on
>     http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performance_optimizations
>     <http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.performance_optimizations>.
>
>     Also this link
>     http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#arrays
>     <http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/wrappers.html#arrays>
>
>     shows that std::vector<> are optimized for serialization.
>       I am now confused that sending std::vector<> like this is good for
>     performance optimization or not? What other better methods are
>     available? Is
>     something like this
>     http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content
>     <http://www.boost.org/doc/libs/1_57_0/doc/html/mpi/tutorial.html#mpi.skeleton_and_content>
>
>     a good option?
>     Best Regards,
>     Salman Arshad
>
>
>
>     --
>     View this message in context:
>     http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-std-vector-tp4672196.html
>     <http://boost.2283326.n4.nabble.com/Performance-optimization-in-Boost-using-std-vector-tp4672196.html>
>
>     Sent from the Boost - Users mailing list archive at Nabble.com.
>     _______________________________________________
>     Boost-users mailing list
>     [hidden email] <javascript:>
>     http://lists.boost.org/mailman/listinfo.cgi/boost-users
>     <http://lists.boost.org/mailman/listinfo.cgi/boost-users>
>
>
>
> _______________________________________________
> Boost-users mailing list
> [hidden email]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Performance optimization in Boost using std::vector<>

saloo
In reply to this post by Gonzalo BG
Thanks Gonzalo for a detailed explanation

So what I understand is to change the code in boost to following code :

#include <boost/mpi.hpp>
#include <iostream>
#include <boost/serialization/vector.hpp>
namespace mpi = boost::mpi;

int main()
{
  mpi::environment env;
  mpi::communicator world;
std::vector<int> my_vector;

if (world.rank() == 0) {
    my_vector.push_back(17);
    my_vector.push_back(38);
    world.send(1, 0, my_vector);
} else {
  std::vector<int> my_vector2;
  my_vector2.reserve(2);
  world.recv(0, 0, my_vector2);
}
  return 0;
}

What is the best option in boost to achieve a good performance? I saw in the code of boost/serialization/vector.hpp that they have an optimized version which keeps track of size and uses serialization wrapper of make_array. How can I force boost to use optimized version for serializing? Below is the code from boost/serialization/vector.hpp:

// the optimized versions

template<class Archive, class U, class Allocator>
inline void save(
    Archive & ar,
    const std::vector<U, Allocator> &t,
    const unsigned int /* file_version */,
    mpl::true_
){
    const collection_size_type count(t.size());
    ar << BOOST_SERIALIZATION_NVP(count);
    if (!t.empty())
        ar << make_array(detail::get_data(t),t.size());
}

template<class Archive, class U, class Allocator>
inline void load(
    Archive & ar,
    std::vector<U, Allocator> &t,
    const unsigned int /* file_version */,
    mpl::true_
){
    collection_size_type count(t.size());
    ar >> BOOST_SERIALIZATION_NVP(count);
    t.resize(count);
    unsigned int item_version=0;
    if(BOOST_SERIALIZATION_VECTOR_VERSIONED(ar.get_library_version())) {
        ar >> BOOST_SERIALIZATION_NVP(item_version);
    }
    if (!t.empty())
        ar >> make_array(detail::get_data(t),t.size());
  }

Or should I skip the boost completely and go to basic MPI commands to send vector as MPI derived data type? Then I should keep in mind what you said about std::unique_ptr and Vect<T> and also reserving memory in recieve buffer at beginning of pragram and reusing it to prevent memopry allocation during send/recieve. How can I reach a good perfomance solution using boost?
 Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Performance optimization in Boost using std::vector<>

Ilja Honkonen-2
In reply to this post by Gonzalo BG
Hello

> There is a known performance problem with serializing a std::vector
>   over MPI.
> Basically, this prevents you from ever reaching the performance of C.
> The problem is on the receive side. When you receive a vector, if you
> don't know the size,
> the receive side has to:
> - get the number of elements of the vector
> - resize the vector (which initializes elements)
> - receive the elements in the vector data (reinitialize the elements)
> The C version of the idiom:
> - gets the number of elements
> - reserves (as opposed to resize) the memory for the elements
> - receive the element in the vector (initialize elements once).
> This might make a small or a large performance difference, profile!
According to the attached program there seems to be a much larger
performance problem than initializing vector elements. The program first
sends a vector of doubles using MPI, then sends another identical vector
with boost::mpi and prints how long these took in seconds. Note that
boost::mpi also sends two messages for run-time sized containers. For
vectors of 1e6 items the program prints (mpi rank is the first number):

mpi
0 resize: 0.0126891, send: 0.00988925, recv: 0
1 resize: 0.0131643, send: 0, recv: 0.00955247
boost::mpi
0 resize: 0.0096425, send: 0.279135, recv: 0
1 resize: 0, send: 0, recv: 0.295702

For vectors of 1e7 items:

mpi
0 resize: 0.0974027, send: 0.0538886, recv: 0
1 resize: 0.105708, send: 0, recv: 0.0456324
boost::mpi
0 resize: 0.0517177, send: 2.70333, recv: 0
1 resize: 0, send: 0, recv: 2.82339

And vectors of 5e7 items:

mpi
0 resize: 0.590099, send: 0.226269, recv: 0
1 resize: 0.440719, send: 0, recv: 0.375706
boost::mpi
0 resize: 0.198448, send: 13.5335, recv: 0
1 resize: 0, send: 0, recv: 14.0518

Boost::mpi version is always at least 10 times slower. It also seems to
run out of memory with smaller number of items implying that unnecessary
copies of data are created somewhere. Based on experience with more
complex programs (e.g. http://dx.doi.org/10.1016/j.jastp.2014.08.012) I
wouldn't recommend boost::mpi for high performance computing. Or in case
of user error at least high performance is easier to get with pure MPI...

I used boost-1.57.0, g++ (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) and
mpirun (Open MPI) 1.6.5.

Ilja

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

test.cpp (2K) Download Attachment