boost asio synchronous vs asynchronous operations performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

boost asio synchronous vs asynchronous operations performance

Donald Alan
Hi,

I'm trying to compare the performance of boost::asio asynchronous vs synchronous IO operations for a single client.

Below, I've sample synchronous and asynchronous server applications, which send 25 byte message to the client in a loop continuously. On the client side, I'm checking at what rate it is able to receive the messages. The sample setup is pretty simple. In synchronous server case, it spawns a new thread per client connection and the thread keeps sending the 25-byte message in a loop. In asynchronous server case as well it spawns a new thread per client connection and the thread keeps sending the 25-byte message in a loop, using asynchronous write (main thread is the one which calls ioservice.run()). For the performance testing I'm using only one client.

Synchronous server code

#include <iostream>
#include <boost/bind.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/enable_shared_from_this.hpp>
#include <boost/asio.hpp>
#include <boost/thread.hpp>

using boost::asio::ip::tcp;

class tcp_connection : public boost::enable_shared_from_this<tcp_connection>
{
public:
    typedef boost::shared_ptr<tcp_connection> pointer;

    static pointer create(boost::asio::io_service& io_service)
    {
       return pointer(new tcp_connection(io_service));
    }

    tcp::socket& socket()
    {
        return socket_;
    }

    void start()
    {
        for (;;) {
            try {
                ssize_t len = boost::asio::write(socket_, boost::asio::buffer(message_));
                if (len != message_.length()) {
                    std::cerr<<"Unable to write all the bytes"<<std::endl;
                    break;
                }
                if (len == -1) {
                    std::cerr<<"Remote end closed the connection"<<std::endl;
                    break;
                }
            }
            catch (std::exception& e) {
                std::cerr<<"Error while sending data"<<std::endl;
                break;
            }
        }
    }

private:
    tcp_connection(boost::asio::io_service& io_service)
        : socket_(io_service),
          message_(25, 'A')
    {
    }

    tcp::socket socket_;
    std::string message_;
};

class tcp_server
{
public:
    tcp_server(boost::asio::io_service& io_service)
        : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234))
    {
        start_accept();
    }

private:
    void start_accept()
    {
        for (;;) {
            tcp_connection::pointer new_connection =
                tcp_connection::create(acceptor_.get_io_service());
            acceptor_.accept(new_connection->socket());
            boost::thread(boost::bind(&tcp_connection::start, new_connection));
        }
    }
    tcp::acceptor acceptor_;
};

int main()
{
    try {
        boost::asio::io_service io_service;
        tcp_server server(io_service);
    }
    catch (std::exception& e) {
        std::cerr << e.what() << std::endl;
    }
    return 0;
}

ASynchronous server code:
#include <iostream>
#include <string>
#include <boost/bind.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/enable_shared_from_this.hpp>
#include <boost/asio.hpp>

#include <boost/thread.hpp>

using boost::asio::ip::tcp;

class tcp_connection
        : public boost::enable_shared_from_this<tcp_connection>
{
public:
    typedef boost::shared_ptr<tcp_connection> pointer;

    static pointer create(boost::asio::io_service& io_service)
    {
        return pointer(new tcp_connection(io_service));
    }

    tcp::socket& socket()
    {
        return socket_;
    }

    void start()
    {
        while (socket_.is_open()) {
            boost::asio::async_write(socket_, boost::asio::buffer(message_),
                boost::bind(&tcp_connection::handle_write, shared_from_this(),
                            boost::asio::placeholders::error,
                            boost::asio::placeholders::bytes_transferred));
        }
    }

private:
    tcp_connection(boost::asio::io_service& io_service)
        : socket_(io_service),
          message_(25, 'A')
    {
    }

    void handle_write(const boost::system::error_code& error,
                      size_t bytes_transferred)
    {
        if (error) {
            if (socket_.is_open()) {
                std::cout<<"Error while sending data asynchronously"<<std::endl;
                socket_.close();
            }
        }
    }

    tcp::socket socket_;
    std::string message_;
};

class tcp_server
{
public:
    tcp_server(boost::asio::io_service& io_service)
        : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234))
    {
        start_accept();
    }

private:
    void start_accept()
    {
        tcp_connection::pointer new_connection =
                tcp_connection::create(acceptor_.get_io_service());
        acceptor_.async_accept(new_connection->socket(),
                boost::bind(&tcp_server::handle_accept, this, new_connection,
                        boost::asio::placeholders::error));
    }

    void handle_accept(tcp_connection::pointer new_connection,
                       const boost::system::error_code& error)
    {
        if (!error) {
            boost::thread(boost::bind(&tcp_connection::start, new_connection));
        }

        start_accept();
    }

    tcp::acceptor acceptor_;
};

int main()
{
    try {
        boost::asio::io_service io_service;
        tcp_server server(io_service);
        io_service.run();
    }
    catch (std::exception& e) {
        std::cerr << e.what() << std::endl;
    }

    return 0;
}

Client code
#include <iostream>

#include <boost/asio.hpp>
#include <boost/array.hpp>

int main(int argc, char* argv[])
{
    if (argc != 3) {
        std::cerr<<"Usage: client <server-host> <server-port>"<<std::endl;
        return 1;
    }

    boost::asio::io_service io_service;
    boost::asio::ip::tcp::resolver resolver(io_service);
    boost::asio::ip::tcp::resolver::query query(argv[1], argv[2]);
    boost::asio::ip::tcp::resolver::iterator it = resolver.resolve(query);
    boost::asio::ip::tcp::resolver::iterator end;
    boost::asio::ip::tcp::socket socket(io_service);
    boost::asio::connect(socket, it);

//    Statscollector to periodically print received messages stats
//    sample::myboost::StatsCollector stats_collector(5);
//    sample::myboost::StatsCollectorScheduler statsScheduler(stats_collector);
//    statsScheduler.start();

    for (;;) {
        boost::array<char, 25> buf;
        boost::system::error_code error;
        size_t len = socket.read_some(boost::asio::buffer(buf), error);
//        size_t len = boost::asio::read(socket, boost::asio::buffer(buf));
        if (len != buf.size()) {
            std::cerr<<"Length is not "<< buf.size() << " but "<<len<<std::endl;
        }
//        stats_collector.incr_msgs_received();
    }
}

<b>Question:
When the client is running against synchronous server it is able to receive around 700K msgs/sec but when it is running against asynchronous server the performance is dropped to around 100K-120K msgs/sec. I know that one should use asynchronous IO for scalability when we have more number of clients and in the above case as I'm using only a single client, the obvious advantage of asynchronous IO is not evident. But the question is, is asynchronous IO expected to effect the performance so badly for a single client case or am I missing some obvious best practices to follow with asynchronous IO? Is the significant drop in the performance is because of the thread switch between ioservice thread (which is main thread in the above case) and connection thread?

Setup:
I'm using BOOST 1.47 on Linux machine.

Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Igor R.
>     void start()
>     {
>         while (socket_.is_open()) {
>             boost::asio::async_write(socket_, boost::asio::buffer(message_),
>                 boost::bind(&tcp_connection::handle_write,
> shared_from_this(),
>                             boost::asio::placeholders::error,
>                             boost::asio::placeholders::bytes_transferred));
>         }
>     }
>
> private:
>     tcp_connection(boost::asio::io_service& io_service)
>         : socket_(io_service),
>           message_(25, 'A')
>     {
>     }
>
>     void handle_write(const boost::system::error_code& error,
>                       size_t bytes_transferred)
>     {
>         if (error) {
>             if (socket_.is_open()) {
>                 std::cout<<"Error while sending data
> asynchronously"<<std::endl;
>                 socket_.close();
>             }
>         }
>     }

I guess in the above handle_wite() you intended to call start() again.

<...>

> When the client is running against synchronous server it is able to receive
> around 700K msgs/sec but when it is running against asynchronous server the
> performance is dropped to around 100K-120K msgs/sec.

Since you use a very small message, the overhead related to the
completion handlers may be significant. In general, it's worth using
performance profiler to see what's going on, but anyway you could use
a trivial "static" handler allocator and see if it helps:
http://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio/example/cpp03/allocation/server.cpp

Of course, ensure you compile with optimizations.
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Nathaniel Fries
In reply to this post by Donald Alan

On 3/21/2014 12:54, Donald Alan wrote:

> Hi,
>
> I'm trying to compare the performance of boost::asio asynchronous vs
> synchronous IO operations for a single client.
>
> Below, I've sample synchronous and asynchronous server applications, which
> send 25 byte message to the client in a loop continuously. On the client
> side, I'm checking at what rate it is able to receive the messages. The
> sample setup is pretty simple. In synchronous server case, it spawns a new
> thread per client connection and the thread keeps sending the 25-byte
> message in a loop. In asynchronous server case as well it spawns a new
> thread per client connection and the thread keeps sending the 25-byte
> message in a loop, using asynchronous write (main thread is the one which
> calls ioservice.run()). For the performance testing I'm using only one
> client.
>
> *Synchronous server code*
>
> #include <iostream>
> #include <boost/bind.hpp>
> #include <boost/shared_ptr.hpp>
> #include <boost/enable_shared_from_this.hpp>
> #include <boost/asio.hpp>
> #include <boost/thread.hpp>
>
> using boost::asio::ip::tcp;
>
> class tcp_connection : public boost::enable_shared_from_this<tcp_connection>
> {
> public:
>      typedef boost::shared_ptr<tcp_connection> pointer;
>
>      static pointer create(boost::asio::io_service& io_service)
>      {
>         return pointer(new tcp_connection(io_service));
>      }
>
>      tcp::socket& socket()
>      {
>          return socket_;
>      }
>
>      void start()
>      {
>          for (;;) {
>              try {
>                  ssize_t len = boost::asio::write(socket_,
> boost::asio::buffer(message_));
>                  if (len != message_.length()) {
>                      std::cerr<<"Unable to write all the bytes"<<std::endl;
>                      break;
>                  }
>                  if (len == -1) {
>                      std::cerr&lt;&lt;&quot;Remote end closed the
> connection&quot;&lt;&lt;std::endl;
>                      break;
>                  }
>              }
>              catch (std::exception&amp; e) {
>                  std::cerr&lt;&lt;&quot;Error while sending
> data&quot;&lt;&lt;std::endl;
>                  break;
>              }
>          }
>      }
>
> private:
>      tcp_connection(boost::asio::io_service&amp; io_service)
>          : socket_(io_service),
>            message_(25, 'A')
>      {
>      }
>
>      tcp::socket socket_;
>      std::string message_;
> };
>
> class tcp_server
> {
> public:
>      tcp_server(boost::asio::io_service&amp; io_service)
>          : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234))
>      {
>          start_accept();
>      }
>
> private:
>      void start_accept()
>      {
>          for (;;) {
>              tcp_connection::pointer new_connection =
>                  tcp_connection::create(acceptor_.get_io_service());
>              acceptor_.accept(new_connection->socket());
>              boost::thread(boost::bind(&tcp_connection::start,
> new_connection));
>          }
>      }
>      tcp::acceptor acceptor_;
> };
>
> int main()
> {
>      try {
>          boost::asio::io_service io_service;
>          tcp_server server(io_service);
>      }
>      catch (std::exception& e) {
>          std::cerr << e.what() << std::endl;
>      }
>      return 0;
> }
>
> *ASynchronous server code:*
> #include <iostream>
> #include <string>
> #include <boost/bind.hpp>
> #include <boost/shared_ptr.hpp>
> #include <boost/enable_shared_from_this.hpp>
> #include <boost/asio.hpp>
>
> #include <boost/thread.hpp>
>
> using boost::asio::ip::tcp;
>
> class tcp_connection
>          : public boost::enable_shared_from_this<tcp_connection>
> {
> public:
>      typedef boost::shared_ptr<tcp_connection> pointer;
>
>      static pointer create(boost::asio::io_service& io_service)
>      {
>          return pointer(new tcp_connection(io_service));
>      }
>
>      tcp::socket& socket()
>      {
>          return socket_;
>      }
>
>      void start()
>      {
>          while (socket_.is_open()) {
>              boost::asio::async_write(socket_, boost::asio::buffer(message_),
>                  boost::bind(&tcp_connection::handle_write,
> shared_from_this(),
>                              boost::asio::placeholders::error,
>                              boost::asio::placeholders::bytes_transferred));
>          }
>      }
>
> private:
>      tcp_connection(boost::asio::io_service& io_service)
>          : socket_(io_service),
>            message_(25, 'A')
>      {
>      }
>
>      void handle_write(const boost::system::error_code& error,
>                        size_t bytes_transferred)
>      {
>          if (error) {
>              if (socket_.is_open()) {
>                  std::cout<<"Error while sending data
> asynchronously"<<std::endl;
>                  socket_.close();
>              }
>          }
>      }
>
>      tcp::socket socket_;
>      std::string message_;
> };
>
> class tcp_server
> {
> public:
>      tcp_server(boost::asio::io_service&amp; io_service)
>          : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234))
>      {
>          start_accept();
>      }
>
> private:
>      void start_accept()
>      {
>          tcp_connection::pointer new_connection =
>                  tcp_connection::create(acceptor_.get_io_service());
>          acceptor_.async_accept(new_connection->socket(),
>                  boost::bind(&tcp_server::handle_accept, this,
> new_connection,
>                          boost::asio::placeholders::error));
>      }
>
>      void handle_accept(tcp_connection::pointer new_connection,
>                         const boost::system::error_code& error)
>      {
>          if (!error) {
>              boost::thread(boost::bind(&tcp_connection::start,
> new_connection));
>          }
>
>          start_accept();
>      }
>
>      tcp::acceptor acceptor_;
> };
>
> int main()
> {
>      try {
>          boost::asio::io_service io_service;
>          tcp_server server(io_service);
>          io_service.run();
>      }
>      catch (std::exception& e) {
>          std::cerr << e.what() << std::endl;
>      }
>
>      return 0;
> }
>
> *Client code*
> #include <iostream>
>
> #include <boost/asio.hpp>
> #include <boost/array.hpp>
>
> int main(int argc, char* argv[])
> {
>      if (argc != 3) {
>          std::cerr<<"Usage: client <server-host> <server-port>"<<std::endl;
>          return 1;
>      }
>
>      boost::asio::io_service io_service;
>      boost::asio::ip::tcp::resolver resolver(io_service);
>      boost::asio::ip::tcp::resolver::query query(argv[1], argv[2]);
>      boost::asio::ip::tcp::resolver::iterator it = resolver.resolve(query);
>      boost::asio::ip::tcp::resolver::iterator end;
>      boost::asio::ip::tcp::socket socket(io_service);
>      boost::asio::connect(socket, it);
>
> //    Statscollector to periodically print received messages stats
> //    sample::myboost::StatsCollector stats_collector(5);
> //    sample::myboost::StatsCollectorScheduler
> statsScheduler(stats_collector);
> //    statsScheduler.start();
>
>      for (;;) {
>          boost::array&lt;char, 25> buf;
>          boost::system::error_code error;
>          size_t len = socket.read_some(boost::asio::buffer(buf), error);
> //        size_t len = boost::asio::read(socket, boost::asio::buffer(buf));
>          if (len != buf.size()) {
>              std::cerr<<"Length is not "<< buf.size() << " but
> "<<len&lt;&lt;std::endl;
>          }
> //        stats_collector.incr_msgs_received();
>      }
> }
>
> &lt;b>Question:*
> When the client is running against synchronous server it is able to receive
> around 700K msgs/sec but when it is running against asynchronous server the
> performance is dropped to around 100K-120K msgs/sec. I know that one should
> use asynchronous IO for scalability when we have more number of clients and
> in the above case as I'm using only a single client, the obvious advantage
> of asynchronous IO is not evident. But the question is, is asynchronous IO
> expected to effect the performance so badly for a single client case or am I
> missing some obvious best practices to follow with asynchronous IO? Is the
> significant drop in the performance is because of the thread switch between
> ioservice thread (which is main thread in the above case) and connection
> thread?
>
> *Setup:*
> I'm using BOOST 1.47 on Linux machine.
>
>
>
>
>
> --
> View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-operations-performance-tp4660578.html
> Sent from the Boost - Users mailing list archive at Nabble.com.
> _______________________________________________
> Boost-users mailing list
> [hidden email]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
http://www.boost.org/doc/libs/1_42_0/doc/html/boost_asio/reference/io_service/io_service.html
Try using concurrency_hint = number of threads you'll create.
At least with I/O Completion ports (Windows NT), only concurrency_hint
threads can perform an asynchronous operation simultaneously. All other
threads have to wait. The default constructor probably uses
concurrency_hint = #processors but tbh I'm not sure.
disclaimer: I'm also not sure if I/O Completion ports actually supports
more threads than #processors. MSDN doesn't suggest that there's any limit.

Also, the main point of asynchronous I/O is that you don't need a
thread-per-file/connection to achieve necessary performance. A typical
asynchronous I/O server will have either 1 or #processors dedicated I/O
threads and use a fixed number of worker threads for non-IO tasks, not a
thread-per-connection. I would suggest that this renders your benchmark
questionable even after you make my suggested change.
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Niall Douglas
In reply to this post by Donald Alan
On 21 Mar 2014 at 9:54, Donald Alan wrote:

> When the client is running against synchronous server it is able to receive
> around 700K msgs/sec but when it is running against asynchronous server the
> performance is dropped to around 100K-120K msgs/sec. I know that one should
> use asynchronous IO for scalability when we have more number of clients and
> in the above case as I'm using only a single client, the obvious advantage
> of asynchronous IO is not evident. But the question is, is asynchronous IO
> expected to effect the performance so badly for a single client case or am I
> missing some obvious best practices to follow with asynchronous IO? Is the
> significant drop in the performance is because of the thread switch between
> ioservice thread (which is main thread in the above case) and connection
> thread?
Linux isn't capable of asynchronous i/o [1], so of course directly
calling synchronous kernel APIs will be faster than using threads to
multiplex kernel APIs. I think most of your disparity though is that
you are doing at least two (and probably more) syscalls per message
for the async case as ASIO must do a poll/select per message. I'd
very interested to see your results on an OS which does implement
async i/o - Windows is the easiest.

Anyway, I really wouldn't worry about ASIO performance. ASIO can
exceed 3m threaded dispatches per second on a quad core Intel. Even
AFIO, which extends ASIO and uses lots of "slow" futures, breaks past
1.5m dispatches/sec.

[1]: Linux can do non-blocking socket i/o, but non-blocking is *not*
asynchronous i/o. Linux can do a limited amount of async file i/o
using a special syscall not used by any of the libc implementations
of POSIX routines. FreeBSD can do async i/o, but ASIO isn't wired up
for it.

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

SMime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Gavin Lambert
In reply to this post by Nathaniel Fries
On 22/03/2014 15:05, Quoth Nate:
> Try using concurrency_hint = number of threads you'll create.
> At least with I/O Completion ports (Windows NT), only concurrency_hint
> threads can perform an asynchronous operation simultaneously. All other
> threads have to wait. The default constructor probably uses
> concurrency_hint = #processors but tbh I'm not sure.
> disclaimer: I'm also not sure if I/O Completion ports actually supports
> more threads than #processors. MSDN doesn't suggest that there's any limit.

It does.  It's fairly common to allocate 1.5x or 2x #processors threads
to the pool.  What happens then is that Windows will keep up to
#processors threads (or whatever other concurrency value you specify)
processing from the completion port at all times -- if one of the worker
threads goes to sleep on some resource other than the completion port
itself (during the course of whatever processing it's doing) then it
will allow one of the "extra" threads to be woken if needed.


_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Donald Alan
In reply to this post by Nathaniel Fries
Nathaniel J Fries wrote
Also, the main point of asynchronous I/O is that you don't need a
thread-per-file/connection to achieve necessary performance. A typical
asynchronous I/O server will have either 1 or #processors dedicated I/O
threads and use a fixed number of worker threads for non-IO tasks, not a
thread-per-connection. I would suggest that this renders your benchmark
questionable even after you make my suggested change.

Yes, I don't intend to create one thread per connection in asynchronous case. Just for this sample use case I created a thread on connection request and used it to generate messages. My main concern is, if we are trying to send messages asynchronously (in non-ioservice thread) then the performance is significantly bad compare to sending the messages synchronously in the connection thread (of course if more and more number of clients get added then thread-per-connection in synchronous case won't scale well).
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Donald Alan
In reply to this post by Gavin Lambert
Gavin Lambert wrote
It does.  It's fairly common to allocate 1.5x or 2x #processors threads
to the pool.  What happens then is that Windows will keep up to
#processors threads (or whatever other concurrency value you specify)
processing from the completion port at all times -- if one of the worker
threads goes to sleep on some resource other than the completion port
itself (during the course of whatever processing it's doing) then it
will allow one of the "extra" threads to be woken if needed.
Given that in the sample use case I've there are only 2 threads that are working on ioservice I think it doesn't matter what value set as concurrency_hint. Let me know if I'm missing something.
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Bjorn Reese
In reply to this post by Donald Alan
On 03/24/2014 03:28 PM, Donald Alan wrote:

> and used it to generate messages. My main concern is, if we are trying to
> send messages asynchronously (in non-ioservice thread) then the performance
> is significantly bad compare to sending the messages synchronously in the
> connection thread (of course if more and more number of clients get added

I ran your code through a profiler and it shows that the slowdown comes
from boost::bind and boost::shared_ptr that are needed to setup the
async operations.

I also tried to change your async_server so that it does not write all
buffers in a loop, but instead writes the next buffer from the handler.
This yielded almost the same performance results.

I also tried to omit the connection thread, so that all work is done in
the io_service thread. Same performance results.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Donald Alan
Thanks Bjorn. So, this definitely seems to suggest that boost async operations are much slower than their corresponding synchronous operations. Is there any way I can avoid creating boost::bind and boost::shared_ptr for each async operation?
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Donald Alan
In reply to this post by Bjorn Reese
Thanks Bjorn. So, this definitely seems to suggest that boost async operations are much slower than their corresponding synchronous operations. Is there any way I can avoid creating boost::bind and boost::shared_ptr for each async operation?
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Niall Douglas
In reply to this post by Bjorn Reese
On 25 Mar 2014 at 14:06, Bjorn Reese wrote:

> > and used it to generate messages. My main concern is, if we are trying to
> > send messages asynchronously (in non-ioservice thread) then the performance
> > is significantly bad compare to sending the messages synchronously in the
> > connection thread (of course if more and more number of clients get added
>
> I ran your code through a profiler and it shows that the slowdown comes
> from boost::bind and boost::shared_ptr that are needed to setup the
> async operations.

If running in a debugger such that Visual Studio disables the
non-pathologically slow memory allocator, I can believe it.

Otherwise I struggle to see how these could cause the kind of figures
the OP was seeing. If AFIO can push 400k ops/sec per core, and it's
doing seven memory allocations and frees per op which include two
std::binds and two std::shared_ptr constructions and deletions, plus
a boost::future creation and deletion which is at least another
boost::shared_ptr, the maths doesn't add up that the OP is so slow.

> I also tried to change your async_server so that it does not write all
> buffers in a loop, but instead writes the next buffer from the handler.
> This yielded almost the same performance results.
>
> I also tried to omit the connection thread, so that all work is done in
> the io_service thread. Same performance results.

Very odd. How much time is spent in the kernel?

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

SMime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Bjorn Reese
On 03/26/2014 01:51 AM, Niall Douglas wrote:

> Otherwise I struggle to see how these could cause the kind of figures
> the OP was seeing. If AFIO can push 400k ops/sec per core, and it's

Your scepticism is warranted. I accidentially misspelled the compiler
optimization option, so all my performance measurements were done
on debug code.

Looking at the new numbers for the optimized build, the main difference
between the synchronous and asynchronous case is due to the internals
of the asio::io_service queue (primarily locking.)

>> I also tried to change your async_server so that it does not write all
>> buffers in a loop, but instead writes the next buffer from the handler.
>> This yielded almost the same performance results.
>>
>> I also tried to omit the connection thread, so that all work is done in
>> the io_service thread. Same performance results.

With optimization on, these changes improve performance by approx 20%.

> Very odd. How much time is spent in the kernel?

Around 5% in debugging code, and 40% in optimized code.
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Gavin Lambert
On 28/03/2014 00:09, quoth Bjorn Reese:
> Looking at the new numbers for the optimized build, the main difference
> between the synchronous and asynchronous case is due to the internals
> of the asio::io_service queue (primarily locking.)

FWIW, that matches my own testing results.

I had a case (with serial ports) where this locking latency was
sufficiently high to be bothersome.  I wrote an experimental lock-free
reactor engine which appears to outperform Asio (at least on Windows) --
but it's also much more limited and doesn't provide some of the same
guarantees.

For normal (particularly socket) usage it's probably not worth the hassle.


_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Niall Douglas
In reply to this post by Bjorn Reese
On 27 Mar 2014 at 12:09, Bjorn Reese wrote:

> > Otherwise I struggle to see how these could cause the kind of figures
> > the OP was seeing. If AFIO can push 400k ops/sec per core, and it's
> >
> > Very odd. How much time is spent in the kernel?
>
> Around 5% in debugging code, and 40% in optimized code.

AFIO spends about 45% of its time in locks as well when fully loaded
on non-TSX hardware. I am looking forward to getting my hands on some
TSX hardware though, as I believe AFIO ought to become little slower
than ASIO i.e. ASIO will be the overwhelming limiting throughput
factor.

Out of curiosity, how many CPU cycles per op in your ASIO test case?
AFIO seems to need ~9,000 CPU cycles per op processed, half of which
is spent spinning on CAS locks - I would assume that ASIO can knock
that down by two thirds?

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

SMime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Niall Douglas
In reply to this post by Gavin Lambert
On 28 Mar 2014 at 18:17, Gavin Lambert wrote:

> I had a case (with serial ports) where this locking latency was
> sufficiently high to be bothersome.  I wrote an experimental lock-free
> reactor engine which appears to outperform Asio (at least on Windows) --
> but it's also much more limited and doesn't provide some of the same
> guarantees.
>
> For normal (particularly socket) usage it's probably not worth the hassle.

Windows has quite chunky thread switch times anyway, so as soon as
you need to wait on another thread via the kernel rather than CAS
lock you can forget about performance. Windows completion ports ought
to be completely user space when thread A posts work to thread B, but
any additional locking e.g. by ASIO can create enough stalls to send
completion ports to sleep in the kernel.

That said, it would be interesting to patch in TSX support to ASIO
instead of its mutex and see what happens.

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

SMime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Bjorn Reese
In reply to this post by Niall Douglas
On 03/28/2014 11:21 AM, Niall Douglas wrote:

> Out of curiosity, how many CPU cycles per op in your ASIO test case?
> AFIO seems to need ~9,000 CPU cycles per op processed, half of which
> is spent spinning on CAS locks - I would assume that ASIO can knock
> that down by two thirds?

1000 cycles/op -- measured via io_service::do_run_once().

It locks/unlocks four times per operation, which accounts for a total
of 20% of the CPU time.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Niall Douglas
On 30 Mar 2014 at 14:00, Bjorn Reese wrote:

> > Out of curiosity, how many CPU cycles per op in your ASIO test case?
> > AFIO seems to need ~9,000 CPU cycles per op processed, half of which
> > is spent spinning on CAS locks - I would assume that ASIO can knock
> > that down by two thirds?
>
> 1000 cycles/op -- measured via io_service::do_run_once().

Just to clarify, my ~9000 cycles/op is for maximum contention i.e.
fully loaded with eight threads all fighting it out. Is your 1000
cycles/op for two threads only?

Methinks AFIO could do with some minimum latency benchmarks actually
... might as well, I already have build time benchmarks.

> It locks/unlocks four times per operation, which accounts for a
total
> of 20% of the CPU time.

I'm actually surprised it's as much as that. I would have thought
twice per operation is the minimum possible, but you have to make
some hard design choices to get it that low. AFIO "looks funny"
partially because it locks exactly twice per op as from the v1.2
engine, unless you have TSX in which case it never locks at all
except if more memory from the kernel is needed.

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

SMime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Bjorn Reese
On 03/30/2014 04:13 PM, Niall Douglas wrote:

> Just to clarify, my ~9000 cycles/op is for maximum contention i.e.
> fully loaded with eight threads all fighting it out. Is your 1000
> cycles/op for two threads only?

Yes.

>> It locks/unlocks four times per operation, which accounts for a
> total
>> of 20% of the CPU time.
>
> I'm actually surprised it's as much as that. I would have thought
> twice per operation is the minimum possible, but you have to make
> some hard design choices to get it that low. AFIO "looks funny"

It uses one lock to protect its epoll data, and three to protect the
io_service members. Don't ask me why it needs three in the latter case.

Talking about surprise, I was surprised that system::system_category()
accounted for 10% of the CPU time, but it looks like Asio wraps all
system calls in system::error_code() before using them (e.g. for
checking for would_block in non-blocking I/O.)

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: boost asio synchronous vs asynchronous operations performance

Niall Douglas
In reply to this post by Niall Douglas
On 30 Mar 2014 at 15:13, Niall Douglas wrote:

> > 1000 cycles/op -- measured via io_service::do_run_once().
>
> Methinks AFIO could do with some minimum latency benchmarks actually
> ... might as well, I already have build time benchmarks.

I have some results: on a 3.5Ghz quad core CPU with hyperthreading,
latencies are as follows:

1-4 concurrency: constant ~9 microseconds between op issue and
operation beginning, ~7 microseconds between operation end and op
future signals. Total latency for main thread: ~ 16 microseconds.

4-8 concurrency: linear rise with concurrency. I assume this is the
hyperthreading.

8-32 concurrency: fairly constant ~12 microseconds between op issue
and operation beginning, ~9 microseconds between operation end and op
future signals. Total latency for main thread: ~ 21 microseconds.

The latency curve after 8 concurrency is pretty flat, but it's
probably because the tasks are getting executed as fast as you can
dispatch them so basically you don't really see the true scaling to
load.

For reference a thread context switch was measured at 0.2
microseconds, obviously there will be quite a few of those during a
typical AFIO op dispatch.

Niall

--
Currently unemployed and looking for work in Ireland.
Work Portfolio: http://careers.stackoverflow.com/nialldouglas/




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users

SMime.p7s (8K) Download Attachment