[beast] Formal review

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[beast] Formal review

Boost - Dev mailing list
I am going to focus exclusively on the HTTP API in the review below
because that needs most attention.

Furthermore, to keep this review within reasonable bounds, I am omitting
issues already reported by others.


I. DESIGN
---------

Beast offers a well-designed WebSocket API along with a minimal and
deficient HTTP API.

Beast claims to offer low-level building-blocks that can be used to
produce higher-level HTTP server and client libraries.

The low-level approach makes Beast minimal. I do not have a problem with
a minimal API, other than it diminishes the usefulness of the library.
Although this can be solved by adding more functionality over time, if
I was to evaluate Beast with its current feature-level, I would only
choose it if I needed WebSocket. This, however, is not a precondition
for library acceptance.

But the low-level approach also means the abstractions must be right.
The abstractions that Beast currently offers are deficient, which is
best shown by the poor support for HTTP chunking. Chunking can and
should be represented more naturally by a HTTP API.

HTTP chunking splits a message into several parts, where each part can
contain metadata (chunk extensions) and data (chunk.) The metadata is a
set of key-value pairs, just like the HTTP headers.

During the review period I have described several use cases for HTTP
chunking that does require metadata (e.g. radio metadata,) as well as
the preservation of chunk boundaries (e.g. event-based publish-subscribe
channels.)

Conceptually, a normal HTTP message has the following structure:

   message = start-line metadata* body?

A chunked HTTP message can be described in a similar manner as:

   message = start-line metadata* ( metadata* chunk? )*

The Beast message container is designed for the structure of the normal
HTTP message, and assumes that chunks are appended the body. The chunk
metadata does not fit into this model, and therefore Beast discards them
by default. The repetitive structure of chunked messages is not
accommodated by the Beast message container. This means that we have to
make our own chunk decorator to generate chunk metadata, and our own
custom parser to receive chunk metadata.

The chunk decorator inserts metadata via a callback mechanism. Its
design is poor for two reasons. First, the user must keep track of
context to ensure that the correct metadata is applied to the correct
chunks. Second, the user must write the metadata in correct HTTP syntax.
The latter may sound trivial, but does anybody here know if

   " ;key=value\r\n"

is correctly formatted? (notice the initial whitespace.) The answer is
that it is correct according to RFC 2616, incorrect according to RFC
7230, and correct according to RFC 7230 errata. The best approach for
handling this situation is to never insert whitespaces, but to always
accept whitespaces from others. Users should not have to be bothered
with such detail.

Receiving chunk metadata rather than having them discarded by default
requires a custom parser. The Beast message parser is a push parser
(SAX) where callbacks are invoked for each parsed message part. This
means that the user must keep track of which metadata belongs to which
chunk. The user must also parse the metadata manually. Beast does not
even check if the syntax is correct.

The custom parser is also necessary if the user needs to receive
individual chunks rather than having them appended to the body. I would
have expected that a low-level API returns the next message part as soon
as it is identified. The current collect-entire-message design is a
higher-level design.

In summary,

   1. Beast should not discard anything by default, but should faithfully
      forward all information received.

   2. Beast should handle all low-level syntax and not leave parts of it
      to the user.

   3. Beast should not break the end-to-end principle unless allowed by
      the user.

As a way forward I am going to sketch another simple message model that
does encompass HTTP chunking. Conceptually it looks like this (I am
going to ignore that, say, metadata may require different syntax
depending on context)

   message = start-line section*
   section = metadata* body-part?

Building an API around this model should offer read/write functions that
handles partial bodies (whether chunked or progressively downloaded.)
The current collect-entire-message API could be build on top of that.
With this approach there is no need to mess around with custom parsers.

This design does require state information to be able to select the
proper syntax for both metadata and data.

On another topic, the HTTP status codes can be roughly categorized as
instructions (the 1xx and 2xx series) and errors (the 4xx and 5xx
series.) I recommend that you create two error categories and pass
status codes as error_code. Notice that I have no comment about whether
or not they should be stored in the message container. What I really
want to achieve is to enable the user to easily propagate HTTP errors
in the same way that non-HTTP errors (including beast::http::error) are
propagated.


II. IMPLEMENTATION
------------------

I read the implementation to understand the design, so I did not look
for QA issues. My overall impression is that much of the code is
well-written, but there are parts that are convoluted and difficult
to verify.

There are too many goto statements for my taste; I do not mind the goto
pattern used in the parser, but the state-machines in read_some_op and
serializer looks more like micro-optimization. Furthermore, "case
do_body + 1:" looks like an awful hack.

The dynamic buffers are useful even by themselves.

The code contains several foreign files with different licensing
conditions:

   * include/beast/core/detail/base64.hpp
   * include/beast/zlib/zlib.hpp
   * include/beast/websocket/detail/utf8_checker.hpp
   * include/beast/http/detail/basic_parser.hpp

All these include both the BSL and the original license. This should be
checked by the Boost legal council.


III. DOCUMENTATION
------------------

The documentation is generally well-written (and it looks like the
DynamicBuffer section got a little help from Networking TS.)

The Reference is missing some concepts, e.g. SyncReadStream.


IV. NAMING
----------

The library name must change. "Boost.Beast" sounds like a bad joke. I do
not have strong opinions about the new name.


V. VERDICT
----------

I recommend to REJECT the library because the HTTP API provides the
wrong low-level abstractions as explained in the design section above.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On Sun, Jul 9, 2017 at 1:57 PM, Bjorn Reese via Boost
<[hidden email]> wrote:
> ...

I'll state for the record that I reached out to you and the mailing
list a few times since Beast was announced in June 2016, offering any
interested parties the opportunity to participate in the design of the
library. No one showed up.

You are correct that Beast could offer more control over the chunking
and better facilities for interacting with extensions. This was by
choice; I did not invest heavily into that part of the library because
it is a feature unlikely to see significant use in the near term.

That is not to say that I won't address it in the future. As others
can attest, I am very responsive to issues and timely with providing
updates and improvements to the library. So it is not a stretch of the
imagination to think that all of your points will be resolved.

So that no one reading Bjorn's review has any misunderstandings: Beast
DOES decode chunked message bodies correctly and provides the decoded
payload to the caller, and can do so incrementally.

What I find interesting is the decision to focus the review on just
one aspect of the library, the handling of chunks for niche use cases.

Bjorn, you are obviously accomplished and knowledgeable with respect
to HTTP and implementations of said protocol, and now you know some
Beast so I pose the question: can you offer any concrete solutions on
how Beast's interfaces may be adjusted (with a minimum of disturbance
to existing use) in order to achieve the results you desire?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Sun, Jul 9, 2017 at 3:13 PM, Vinnie Falco <[hidden email]> wrote:
> On Sun, Jul 9, 2017 at 1:57 PM, Bjorn Reese via Boost
> <[hidden email]> wrote:
>> ...
>
> I'll state for the record that I reached out to you and the mailing
> list a few times since Beast was announced in June 2016, offering any
> interested parties the opportunity to participate in the design of the
> library. No one showed up.

For example, I'm doing a mock-up of what a solution to the chunk
encoding problem you described might look like, and I have questions
which would be simple for an expert like yourself to answer and save
me a lot of time (and help create a better product). Here's am
example:

* Should the new chunk interface require that the chunk exist in
memory as a single ConstBufferSequence?

* Or is it necessary to support a system where a chunk can be sent
using a smaller buffer, periodically refilled?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
> I'll state for the record that I reached out to you and the mailing
> list a few times since Beast was announced in June 2016, offering any
> interested parties the opportunity to participate in the design of the
> library. No one showed up.

It is quite usual for really useful feedback to not arrive until a
library is prepared for formal review. Your users before a review are by
definition already fans, and won't give you really useful feedback. It
takes a lot of effort to give high quality feedback, and many of us
can't spare the time until we know the thing we are reviewing is
supposed to be finished. I know that is frustrating, but it is what it is.

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
> The code contains several foreign files with different licensing
> conditions:
>
>
>   * include/beast/websocket/detail/utf8_checker.hpp

Note: there are public header only UTF-8 API in Boost.Locale - I
suggest to use one instead of reimplementing/borrowing one.

Artyom

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 10-07-17 00:13, Vinnie Falco via Boost wrote:
> it is a feature unlikely to see significant use in the near term.

I don't think this is how Boost design review can be driven.

You get 1 chance to do the right thing, or you cripple the library
either due to legacy stuff or due to continually breaking changes.

> That is not to say that I won't address it in the future.
The library is being reviewed as-is. That is the intent, and has been
your request.

Seth


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 10-07-17 02:55, Vinnie Falco via Boost wrote:
> * Should the new chunk interface require that the chunk exist in
> memory as a single ConstBufferSequence?
>
> * Or is it necessary to support a system where a chunk can be sent
> using a smaller buffer, periodically refilled?

Those seem like good questions. +1 for focusing on the ball and keeping
things moving. Cheers for all the good work


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Jul 10, 2017 at 1:52 AM, Artyom Beilis via Boost
<[hidden email]> wrote:
> Note: there are public header only UTF-8 API in Boost.Locale - I
> suggest to use one instead of reimplementing/borrowing one.

Which Boost.Locale function validates utf8?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On Mon, Jul 10, 2017 at 5:09 PM, Vinnie Falco via Boost
<[hidden email]> wrote:
> On Mon, Jul 10, 2017 at 1:52 AM, Artyom Beilis via Boost
> <[hidden email]> wrote:
>> Note: there are public header only UTF-8 API in Boost.Locale - I
>> suggest to use one instead of reimplementing/borrowing one.
>
> Which Boost.Locale function validates utf8?
>

You can use utf_traits

http://www.boost.org/doc/libs/1_64_0/libs/locale/doc/html/structboost_1_1locale_1_1utf_1_1utf__traits.html

decode does the job.

Artyom

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On Mon, Jul 10, 2017 at 7:47 AM, Artyom Beilis via Boost
<[hidden email]> wrote:
> You can use utf_traits
> ...
http://www.boost.org/doc/libs/1_64_0/libs/locale/doc/html/structboost_1_1locale_1_1utf_1_1utf__traits.html
>
> decode does the job.

The utf-8 validation of text websocket message payloads is a critical
bottleneck. The best websocket implementations apply significant
optimizations to this operation, recognizing the vast majority of
inputs are low-ascii (no code point larger than one byte). For example
JSON data. With collaboration from Ripple employees, the code in Beast
was developed as a high performance utf8-validation function.

Preliminary tests indicate that switching to Boost.Locale away from
Beast's well optimized and well tested utf8 validation function would
incur a 67,400% performance penalty. I hope you understand that I
might be reluctant to switch.

Benchmark results:

beast.benchmarks.utf8_checker
beast:  2,016,637,738 char/s
beast:  1,921,062,599 char/s
beast:  1,939,159,018 char/s
locale: 3,053,539 char/s
locale: 2,989,265 char/s
locale: 3,060,962 char/s
Longest suite times:
   17.8s beast.benchmarks.utf8_checker
17.8s, 1 suite, 1 case, 1 test total, 0 failures
The program '[75300] benchmarks.exe' has exited with code 0 (0x0).

Code:
<https://github.com/vinniefalco/Beast/commit/3df7de8ce2e8f797722118b9d751266241a8266e>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On Mon, Jul 10, 2017 at 8:15 AM, Vinnie Falco <[hidden email]> wrote:
> With collaboration from Ripple employees, the code in Beast
> was developed as a high performance utf8-validation function.

Apologies, I forgot to credit Peter Dimov who improved it further just
before the Beast formal review.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vinnie Falco wrote:
> The utf-8 validation of text websocket message payloads is a critical
> bottleneck.

I thought I'd have a look at that.  Your code has a fast-path that
attempts to check 8 bytes at a time by masking with 0x8080808080808080.
I agree that that is probably a useful optimisation which a compiler is
unlikely to apply by itself.  Your implementation
(
https://github.com/vinniefalco/Beast/blob/6f88f0182d461e9cabe55032f966c9ca4f77bf46/include/beast/websocket/detail/utf8_checker.hpp 
) does this:

   if((*reinterpret_cast<std::size_t const*>(in) & mask) != 0)

So you're reinterpret_casting a char* to a size_t* and then dereferencing
it.  Isn't that undefined behaviour?


Regards, Phil.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On 07/10/17 19:57, Phil Endecott via Boost wrote:
> Vinnie Falco wrote:
>> The utf-8 validation of text websocket message payloads is a critical
>> bottleneck.
>
> I thought I'd have a look at that.  Your code has a fast-path that
> attempts to check 8 bytes at a time by masking with 0x8080808080808080.
> I agree that that is probably a useful optimisation which a compiler is
> unlikely to apply by itself.

I wouldn't be so quick to agree. Modern compilers tend to be able to
vectorize simple loops quite efficiently.

>  Your implementation
> (
> https://github.com/vinniefalco/Beast/blob/6f88f0182d461e9cabe55032f966c9ca4f77bf46/include/beast/websocket/detail/utf8_checker.hpp 
> ) does this:
>
>    if((*reinterpret_cast<std::size_t const*>(in) & mask) != 0)
>
> So you're reinterpret_casting a char* to a size_t* and then dereferencing
> it.  Isn't that undefined behaviour?

C++ allows chars and unsigned chars (which std::uint8_t presumably is)
to alias other types, so unless the compiler is able to prove that the
pointers refer to an array of something else than std::size_t, it should
not apply optimizations based on strict aliasing rules.

The code above performs pointer alignment, which strictly speaking has
unspecified behavior because it assumes pointer representation. It would
be cleaner to use std::align for that.

BTW, the fast loop ends on the first byte >=0x80 and then the validation
continues in the slow loop until the end of input. It might be better to
resume the fast loop if validation of the "vector" passes. Otherwise,
for example, and early comment in the input could ruin the performance.
(I don't know how probable such case would be.)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
Andrey Semashev wrote:
> On 07/10/17 19:57, Phil Endecott via Boost wrote:
> > I thought I'd have a look at that.  Your code has a fast-path that
> > attempts to check 8 bytes at a time by masking with 0x8080808080808080.
> > I agree that that is probably a useful optimisation which a compiler is
> > unlikely to apply by itself.
>
> I wouldn't be so quick to agree. Modern compilers tend to be able to
> vectorize simple loops quite efficiently.

Try it.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Jul 10, 2017 at 9:57 AM, Phil Endecott via Boost
<[hidden email]> wrote:
> So you're reinterpret_casting a char* to a size_t* and then dereferencing
> it.  Isn't that undefined behaviour?

Which platform do you think this doesn't work on?

I have to apologize again for the earlier misleading comments. The
benchmarks were measuring the wrong thing. The actual figure is more
in line with 20,000%.

The reinterpret_cast<> can be trivially changed to std::memcpy:

    std::size_t temp;
    std::memcpy(&temp, in, sizeof(temp));
    if((temp & mask) != 0)

This hurts MSVC a bit, gcc/clang/MIPS not at all, but ARM takes quite a hit:

<https://godbolt.org/g/HFxxub>

(Thanks to Peter for working up the CE ARM example)

Updated benchmarks (run on MSVC with optimizations)

With reinterpret_cast
beast.benchmarks.utf8_checker
beast:  1,977,629,044 char/s
beast:  1,474,907,121 char/s
beast:  1,930,979,173 char/s
beast:  1,899,078,149 char/s
beast:  1,893,740,588 char/s
locale: 81,966,125 char/s
locale: 82,200,658 char/s
locale: 81,802,070 char/s
locale: 82,103,369 char/s
locale: 81,802,149 char/s
Longest suite times:
    2.7s beast.benchmarks.utf8_checker
2.7s, 1 suite, 1 case, 1 test total, 0 failures
The program '[79364] benchmarks.exe' has exited with code 0 (0x0).

With std::memcpy
beast.benchmarks.utf8_checker
beast:  1,124,515,969 char/s
beast:  1,336,074,093 char/s
beast:  1,494,183,562 char/s
beast:  1,506,365,044 char/s
beast:  1,533,419,187 char/s
locale: 75,457,683 char/s
locale: 81,358,140 char/s
locale: 80,413,657 char/s
locale: 81,635,114 char/s
locale: 67,234,619 char/s
Longest suite times:
    3.0s beast.benchmarks.utf8_checker
3.0s, 1 suite, 1 case, 1 test total, 0 failures
The program '[82220] benchmarks.exe' has exited with code 0 (0x0).

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Interesting...

How did you get these numbers...
My small benchmark (see below)

Gives

For: https://en.wikipedia.org/wiki/War_and_Peace
Beast 0.0555949ms
Locale 0.110325ms

For: https://he.wikipedia.org/wiki/%D7%9E%D7%9C%D7%97%D7%9E%D7%94_%D7%95%D7%A9%D7%9C%D7%95%D7%9D
Beast 0.0542079ms
Locale 0.0772768ms

For: https://zh.wikipedia.org/wiki/%E6%88%B0%E7%88%AD%E8%88%87%E5%92%8C%E5%B9%B3
Beast 0.0849873ms
Locale 0.0930336ms

using gcc 5.4.0 Ubuntu 64 bit.

I mean Beast is somewhat faster but by no means by the magnitude you show
Have you run your benchmark using release mode?

It looks for me you are running into premature optimization.

Artyom

----------------------------------
#include <beast/websocket/detail/utf8_checker.hpp>
#include <boost/locale.hpp>
#include <iostream>
#include <sstream>
#include <fstream>
#include <chrono>

bool locale_test(char const *msg,size_t len)
{
       char const *e = msg + len;
       while(msg!=e) {
               using namespace boost::locale;
               auto cp = utf::utf_traits<char>::decode(msg,e);
               if(cp == utf::illegal || cp==utf::incomplete)
                       return false;
       }
       return true;
}



int main()
{


       std::ifstream tmp("/tmp/index.html");
       std::stringstream ss;
       ss << tmp.rdbuf();
       std::string sample = ss.str();

       using beast::websocket::detail::check_utf8;

       int factor = 10;
       {
               auto start = std::chrono::high_resolution_clock::now();
               for(int i=0;i<factor*1000;i++) {
                       bool v1 = check_utf8(sample.c_str(),sample.size());
               }
               auto end = std::chrono::high_resolution_clock::now();
               double time =
std::chrono::duration_cast<std::chrono::duration<double, std::ratio<1>
> >(end-start).count() / factor;
               std::cout << "Beast " << time << "ms" << std::endl;
       }
       {
               auto start = std::chrono::high_resolution_clock::now();
               for(int i=0;i<factor*1000;i++) {
                       bool v2 = locale_test(sample.c_str(),sample.size());
               }
               auto end = std::chrono::high_resolution_clock::now();
               double time =
std::chrono::duration_cast<std::chrono::duration<double, std::ratio<1>
> >(end-start).count() / factor;
               std::cout << "Locale " << time << "ms" << std::endl;
       }

}


>
> Benchmark results:
>
> beast.benchmarks.utf8_checker
> beast:  2,016,637,738 char/s
> beast:  1,921,062,599 char/s
> beast:  1,939,159,018 char/s
> locale: 3,053,539 char/s
> locale: 2,989,265 char/s
> locale: 3,060,962 char/s
> Longest suite times:
>    17.8s beast.benchmarks.utf8_checker
> 17.8s, 1 suite, 1 case, 1 test total, 0 failures
> The program '[75300] benchmarks.exe' has exited with code 0 (0x0).
>
> Code:
> <https://github.com/vinniefalco/Beast/commit/3df7de8ce2e8f797722118b9d751266241a8266e>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On Mon, Jul 10, 2017 at 2:06 PM, Artyom Beilis via Boost
<[hidden email]> wrote:
> It looks for me you are running into premature optimization.

Hmm, no, I don't think so. Correct utf8 validation is a bottleneck for
every websocket program, I have used a profiler so this comes from
measurement not opinion.

It looks like you are using source inputs that contain high-ascii
characters. In this case, Beast switches to the "slow" algorithm which
is similar to what Locale does. Try using an input file that consists
only of low-ASCII characters.

Your results are quite different from mine, even with std::memcpy:

beast:  1,124,515,969 char/s
beast:  1,336,074,093 char/s
beast:  1,494,183,562 char/s
beast:  1,506,365,044 char/s
beast:  1,533,419,187 char/s
locale: 75,457,683 char/s
locale: 81,358,140 char/s
locale: 80,413,657 char/s
locale: 81,635,114 char/s
locale: 67,234,619 char/s


Ubuntu VM:
beast.benchmarks.utf8_checker
beast:  2894806032 char/s
beast:  2874126708 char/s
beast:  2890616214 char/s
beast:  2017890885 char/s
beast:  2785087614 char/s
locale: 574731777 char/s
locale: 571439694 char/s
locale: 242245477 char/s
locale: 511534158 char/s
locale: 574121386 char/s

Travis
<https://travis-ci.org/vinniefalco/Beast/jobs/252155928#L1334>
beast:  1155900653 char/s
beast:  1146058480 char/s
beast:  1162309551 char/s
beast:  1151093660 char/s
beast:  1159334387 char/s
locale: 218684840 char/s
locale: 220357048 char/s
locale: 208476005 char/s
locale: 224853783 char/s
locale: 209990002 char/s

On every machine I try, locale performs more poorly on all-low-ascii
inputs by at least a factor of 5.

Code:
<https://github.com/vinniefalco/Beast/blob/da7946b6e5f8bda225ff122984e945b9e088a196/test/benchmarks/utf8_checker.cpp#L78>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Andrey Semashev wrote:

> On 07/10/17 19:57, Phil Endecott via Boost wrote:
>> Vinnie Falco wrote:
>>> The utf-8 validation of text websocket message payloads is a critical
>>> bottleneck.
>>
>> I thought I'd have a look at that.  Your code has a fast-path that
>> attempts to check 8 bytes at a time by masking with 0x8080808080808080.
>> I agree that that is probably a useful optimisation which a compiler is
>> unlikely to apply by itself.
>
> I wouldn't be so quick to agree. Modern compilers tend to be able to
> vectorize simple loops quite efficiently.

Not this one (I've tried it).

>>  Your implementation
>> (
>> https://github.com/vinniefalco/Beast/blob/6f88f0182d461e9cabe55032f966c9ca4f77bf46/include/beast/websocket/detail/utf8_checker.hpp 
>> ) does this:
>>
>>    if((*reinterpret_cast<std::size_t const*>(in) & mask) != 0)
>>
>> So you're reinterpret_casting a char* to a size_t* and then dereferencing
>> it.  Isn't that undefined behaviour?
>
> C++ allows chars and unsigned chars (which std::uint8_t presumably is)
> to alias other types

My understanding is that you can reinterpret_cast FROM other types TO
chars, but not FROM chars TO other types.  But there are surely people
here who understand this stuff better than me.

> BTW, the fast loop ends on the first byte >=0x80 and then the validation
> continues in the slow loop until the end of input. It might be better to
> resume the fast loop if validation of the "vector" passes. Otherwise,
> for example, and early comment in the input could ruin the performance.

Yes, I also noticed that; quite a common case in en_GB would be to have
mostly-ASCII with occasional currency symbols.


Vinnie Falco wrote:
>> So you're reinterpret_casting a char* to a size_t* and then dereferencing
>> it.  Isn't that undefined behaviour?
>
> Which platform do you think this doesn't work on?

Wrong question.  It's likely that it works on current compilers on current
platforms.  But if I'm right that it is undefined behaviour then it might
stop working with some future compiler update somewhere.

> The reinterpret_cast<> can be trivially changed to std::memcpy:
>
>     std::size_t temp;
>     std::memcpy(&temp, in, sizeof(temp
>     if((temp & mask) != 0)

Yes, I believe that's the right thing to do.


Regarding the relative performance of different UTF-8 checking code:

I've done a quick test by hacking some of my own UTF-8 decoding code to
just validate, and I see a factor of about 4 improvement between the
byte-at-a-time and an optimised version that checks the top bits of 8 bytes
at a time.  The code is below.  I measure a throughput of about 1.6 GB/second
for ASCII-only input, on my ARMv8 test system.  I'd be interested to hear how
that compares to the other implementations; a very quick test suggests that
it might be a bit quicker than the Beast code but I'd like someone else to
confirm or deny.  (It might also be wrong - I've not tested it with non-ASCII
input.)

Note that this is only about 50 lines of code; Beast's utf8_checker.hpp is
maybe 5 times as long.  Code follows.

Regards, Phil.


template <typename ITER>
bool is_valid_utf8(ITER i, ITER end)
{
  // Check if range is valid and complete UTF-8.
  // (Maybe incomplete input, i.e. ending in the middle of a multi-byte character, needs
  // to be reported differently?)

  while (i != end) {

    // If i is suitably aligned, do a fast word-at-a-time check for ASCII characters.
    // FIXME this only works if ITER is a contiguous iterator; it needs a "static if".
    const char* p = &(*i);
    const char* e = p + (end-i);  // I don't think &(*end) is allowed because it appears to dereference end.
    unsigned long int w;          // Should be 32 bits on 32-bit processor and 64 bits on 64-bit processor.
    if (reinterpret_cast<uintptr_t>(p) % sizeof(w) == 0) {
      while (p+sizeof(w) <= e) {
        memcpy(&w,p,sizeof(w));
        if (w & 0x8080808080808080) break;  // If any of the top bits are set, fall back to the
                                            // byte-at-a-time code below.
                                            // (Is there a better way to write the mask value that would work
                                            // for e.g. 128-bit ints?  Is that expression OK for 32-bit ints?)
        p += sizeof(w);
        i += sizeof(w);
      }
      if (p == e) break;
    }

    uint8_t b0 = *i++;
    if ((b0 & 0x80) == 0) continue;         // Single byte chars are 0xxxxxxx
    if (i == end) return false;             // Incomplete
    uint8_t b1 = *i++;
    if ((b1 & 0xc0) != 0x80) return false;  // Following bytes are all 10xxxxxx
    if ((b0 & 0xe0) == 0xc0) continue;      // Two-byte chars start 110xxxxx
    if (i == end) return false;             // Incomplete
    uint8_t b2 = *i++;
    if ((b2 & 0xc0) != 0x80) return false;  // Following bytes are all 10xxxxxx
    if ((b0 & 0xf0) == 0xe0) continue;      // Three-byte chars start 1110xxxx
    if (i == end) return false;             // Incomplete
    uint8_t b3 = *i++;
    if ((b3 & 0xc0) != 0x80) return false;  // Following bytes are all 10xxxxxx
    if ((b0 & 0xf8) == 0xf0) continue;      // Four-byte chars start 11110xxx
    return false;                           // First byte is invalid, i.e. 10xxxxxx or 11111xxx

  }
  return true;
}





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
On Mon, Jul 10, 2017 at 5:34 PM, Phil Endecott via Boost
<[hidden email]> wrote:
> ...if I'm right that it is undefined behaviour then it might
> stop working with some future compiler update somewhere.
>
>> The reinterpret_cast<> can be trivially changed to std::memcpy:
>> ...
> Yes, I believe that's the right thing to do.

That hurts 32-bit ARM. So I am faced with a choice, penalize an
existing platform today to benefit some possible future platform?
Hmmmm......let me think about that....

...probably not such a good idea! And the 32-bit ARM users might
revolt with such a change. I've heard from a couple of users running
Beast on constrained hardware.

> Note that this is only about 50 lines of code; Beast's utf8_checker.hpp is
> maybe 5 times as long.  Code follows.

Nice, this is great! I like where you are headed with your function
and thanks for investing the time to write it. Perhaps the Beast utf8
validator could be improved, there's nothing more satisfying than
removing lines of code!

There's just an eensy teensy problem, the Beast validator is an
"online" algorithm. It works with chunks of the entire input sequence
at a time, sequentially, so there could be a code point that is split
across the buffer boundary. All of that extra code you see in Beast is
designed to handle that case, by saving the bytes at the end for when
it gets called again (after the validator returns it will never see
that buffer again).

I admit that there is surprisingly large amount of code required just
to handle this case. The good news is that those extra lines only
execute in the special case where the code point is split. The bulk of
the loop works on the parts of the buffer where code points can't
possibly be split. And the unit test is exhaustive, it tries all
possible code points and split positions.

But who knows? I have never claimed to be a great coder, I consider
myself average at best so its entirely possible that this could all be
done in far fewer lines. Maybe you can update your function to handle
this case? I am always happy to accept improvements into Beast. You
might need to turn the function into a class to save those bytes at
the end.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [beast] Formal review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 07/11/17 03:34, Phil Endecott via Boost wrote:
> Andrey Semashev wrote:
>>
>> C++ allows chars and unsigned chars (which std::uint8_t presumably is)
>> to alias other types
>
> My understanding is that you can reinterpret_cast FROM other types TO
> chars, but not FROM chars TO other types.

You can cast both ways. The casted-from-char pointer can be used as long
as the underlying object, in the C++ object model, matches the pointed
type. In other words, this is fine:

   std::size_t* p1 = new std::size_t[10];
   char* p2 = reinterpret_cast< char* >(p1);
   std::size_t* p3 = reinterpret_cast< std::size_t* >(p2);
   assert(p1 == p3);
   p3[0] = 10; // ok

In Beast's case we (and compiler) cannot tell whether the pointers refer
to std::size_t objects. For compiler that means it has to assume the
objects exist and thus prevent any optimizations that would contradict it.

The cases where the compiler cannot tell whether the object exists or
not are kind of a grey area in the standard because it doesn't clarify
that, but in practice it must work. Otherwise it would be the end of the
world because we use C APIs everywhere and C cannot create a C++ object
currently. There were multiple discussions about this on the std mailing
lists.

Note that when you _can_ tell there is an object, and the type of the
object is different from the pointee then this is still UB:

   double* p1 = new double[10];
   char* p2 = reinterpret_cast< char* >(p1);
   std::size_t* p3 = reinterpret_cast< std::size_t* >(p2);
   p3[0] = 10; // UB

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
123
Loading...