Boost.Beast includes CRLF in HTTP fields

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
Hello,

It looks like HTTP response headers are saved with trailing CRLF in the
message upon receiption after (async_)read.

Since it usually does not add additional valuable information, would it
be possible to remove them in next version of Boost.Beast? It will be
more convenient as you don't need to trim them when you want to capture
the field value.

Regards,

--
David
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Fri, Jul 6, 2018 at 3:45 AM, David Demelier via Boost-users
<[hidden email]> wrote:
> It looks like HTTP response headers are saved with trailing CRLF in the
> message upon receiption after (async_)read.

Yes, beast::http::basic_fields stores each field/value pair in its
serialized format, which includes the trailing CRLF and also the colon
and space. It is done this way so that beast::http::serializer
requires no memory allocations to perform its function.

> Since it usually does not add additional valuable information, would it
> be possible to remove them in next version of Boost.Beast?

No

> It will be more convenient as you don't need to trim them when you want
> to capture the field value.

You shouldn't have to do any trimming. basic_fields member functions
and iterator value_type which return field values do so using a
string_view, whose length excludes the trailing CRLF and also any
colon and space. Callers should not assume the string is
null-terminated, and work with the interface provided by string_view.

See:

<https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/ref/boost__beast__http__basic_fields__value_type/name_string.html>
<https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/ref/boost__beast__http__basic_fields__value_type/value.html>
<https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/ref/boost__beast__http__basic_fields/operator_lb__rb_/overload1.html>

Thanks
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Fri, 2018-07-06 at 08:31 -0700, Vinnie Falco via Boost-users wrote:
> You shouldn't have to do any trimming. basic_fields member functions
> and iterator value_type which return field values do so using a
> string_view, whose length excludes the trailing CRLF and also any
> colon and space. Callers should not assume the string is
> null-terminated, and work with the interface provided by string_view.
>

Ah yes, I didn't see the basic_fields' value_type has dedicated member
functions, I was blindly doing:

    cout << foo[field::location];

Thanks!

--
David
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
In reply to this post by Boost - Users mailing list


On Fri, 6 Jul 2018 at 17:32, Vinnie Falco via Boost-users <[hidden email]> wrote:
On Fri, Jul 6, 2018 at 3:45 AM, David Demelier via Boost-users
<[hidden email]> wrote:
> It looks like HTTP response headers are saved with trailing CRLF in the
> message upon receiption after (async_)read.

Yes, beast::http::basic_fields stores each field/value pair in its
serialized format, which includes the trailing CRLF and also the colon
and space. It is done this way so that beast::http::serializer
requires no memory allocations to perform its function.

> Since it usually does not add additional valuable information, would it
> be possible to remove them in next version of Boost.Beast?

No

> It will be more convenient as you don't need to trim them when you want
> to capture the field value.

You shouldn't have to do any trimming. basic_fields member functions
and iterator value_type which return field values do so using a
string_view, whose length excludes the trailing CRLF and also any
colon and space. Callers should not assume the string is
null-terminated, and work with the interface provided by string_view.

See:

<https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/ref/boost__beast__http__basic_fields__value_type/name_string.html>
<https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/ref/boost__beast__http__basic_fields__value_type/value.html>
<https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/ref/boost__beast__http__basic_fields/operator_lb__rb_/overload1.html>


Out of interest, why do these functions return a const string_view?

 
Thanks
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On Sat, Jul 7, 2018 at 1:19 AM, David Demelier via Boost-users
<[hidden email]> wrote:
> I was blindly doing:
>
>     cout << foo[field::location];

If `foo` is an instance of http::basic_fields, then the expression
above should not produce the CRLF sequence. It should output only the
value of the Location field.

On Sat, Jul 7, 2018 at 1:43 AM, Richard Hodges via Boost-users
<[hidden email]> wrote:
> Out of interest, why do these functions return a const string_view?

To prevent assignments such as:

    foo[field::location] = "localhost:8080/index.html";

The statement above has no effect, the `const` annotation prevents it
from compiling.
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On Fri, 2018-07-06 at 08:31 -0700, Vinnie Falco via Boost-users wrote:
> > It will be more convenient as you don't need to trim them when you
> > want
> > to capture the field value.
>
> You shouldn't have to do any trimming. basic_fields member functions
> and iterator value_type which return field values do so using a
> string_view, whose length excludes the trailing CRLF and also any
> colon and space. Callers should not assume the string is
> null-terminated, and work with the interface provided by string_view.

Okay, maybe I'm stupid or I don't understand something.

I've done a simple request to google.fr and the
http::response[field::location] still has CRLF even with the following
code using value_type's value() function:

    if (res_.result() == status::moved_permanently) {
        const auto it = res_.find(field::location);

        if (it != res_.end())
            cout << "new location: [" << it->value << "]" << std::endl;
    }

The res_ object is a http::response<http::string_body>, And once
printed to the console I have:

    new location: [http://www.google.fr/
    ]

May I miss something from your original answer?

Regards,

--
David
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Sat, Jul 7, 2018 at 8:11 AM, David Demelier via Boost-users
<[hidden email]> wrote:
> The res_ object is a http::response<http::string_body>, And once
> printed to the console I have:
>
>     new location: [http://www.google.fr/
>     ]

Hmm...that's not what I'm seeing. I added this function to
test/beast/http/fields.cpp:

    void
    testValue()
    {
        request<empty_body> req;
        req.insert(field::location, "http://www.google.fr/");
        auto it = req.find(field::location);
        if(it != req.end())
            log << "[" << it->value() << "]";
    }

The output is:

    beast.http.fields
    [http://www.google.fr/]25.6s, 1 suite, 1 case, 334 tests total, 0 failures
    The program '[0x6BF4] tests-beast-http.exe' has exited with code 0 (0x0).

If the CRLF was being included in the value, then none of these tests
would pass, as the equalities would evaluate to false:

<https://github.com/boostorg/beast/blob/1a008faf0ab99b90ac64595164fd34f11af72cce/test/beast/http/fields.cpp#L965>

Thanks
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Sat, 2018-07-07 at 08:20 -0700, Vinnie Falco via Boost-users wrote:

> On Sat, Jul 7, 2018 at 8:11 AM, David Demelier via Boost-users
> <[hidden email]> wrote:
> > The res_ object is a http::response<http::string_body>, And once
> > printed to the console I have:
> >
> >     new location: [http://www.google.fr/
> >     ]
>
> Hmm...that's not what I'm seeing. I added this function to
> test/beast/http/fields.cpp:
>
>     void
>     testValue()
>     {
>         request<empty_body> req;
>         req.insert(field::location, "http://www.google.fr/");
>         auto it = req.find(field::location);
>         if(it != req.end())
>             log << "[" << it->value() << "]";
>     }
>
> The output is:
>
>     beast.http.fields
>     [http://www.google.fr/]25.6s, 1 suite, 1 case, 334 tests total, 0
> failures
>     The program '[0x6BF4] tests-beast-http.exe' has exited with code
> 0 (0x0).
>
> If the CRLF was being included in the value, then none of these tests
> would pass, as the equalities would evaluate to false:

My bad, I apologize I wrongly converted the boost::string_view to
std::string using the .data() member function and I just realized it's
not null terminated... That's why my string was filled with random
characters.

Sorry for the noise!

I should search how people convert boost::string_view to std::string in
a convenient manner instead ;)

Regards,

--
David
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Sat, Jul 7, 2018 at 8:34 AM, David Demelier via Boost-users
<[hidden email]> wrote:
> I wrongly converted the boost::string_view to std::string
> using the .data() member function and I just realized it's
> not null terminated...

Ah! Glad to see the mystery is solved.

However, you would be better off working directly with the string_view
instead of constructing a string, to avoid an unnecessary dynamic
memory allocation.

Regards
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
In reply to this post by Boost - Users mailing list


Le sam. 7 juil. 2018 à 11:34, David Demelier via Boost-users <[hidden email]> a écrit :
On Sat, 2018-07-07 at 08:20 -0700, Vinnie Falco via Boost-users wrote:
> On Sat, Jul 7, 2018 at 8:11 AM, David Demelier via Boost-users
> <[hidden email]> wrote:
> > The res_ object is a http::response<http::string_body>, And once
> > printed to the console I have:
> >
> >     new location: [http://www.google.fr/
> >     ]
>
> Hmm...that's not what I'm seeing. I added this function to
> test/beast/http/fields.cpp:
>
>     void
>     testValue()
>     {
>         request<empty_body> req;
>         req.insert(field::location, "http://www.google.fr/");
>         auto it = req.find(field::location);
>         if(it != req.end())
>             log << "[" << it->value() << "]";
>     }
>
> The output is:
>
>     beast.http.fields
>     [http://www.google.fr/]25.6s, 1 suite, 1 case, 334 tests total, 0
> failures
>     The program '[0x6BF4] tests-beast-http.exe' has exited with code
> 0 (0x0).
>
> If the CRLF was being included in the value, then none of these tests
> would pass, as the equalities would evaluate to false:

My bad, I apologize I wrongly converted the boost::string_view to
std::string using the .data() member function and I just realized it's
not null terminated... That's why my string was filled with random
characters.

Why  do you use data()  without using size()/length() ?

with string_view this is a recipe for hard to find bugs.

At my job I've seen same behavior, and I am trying to explain to people that data() in std::string & std::string_view is mostly for C interfaces.




Sorry for the noise!

I should search how people convert boost::string_view to std::string in
a convenient manner instead ;)

Regards,

--
David
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users


--
Daniel
«Il faut imaginer Sisyphe heureux»
 Albert Camus

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Sat, 2018-07-07 at 13:48 -0400, Daniel Anderson wrote:
> Why  do you use data()  without using size()/length() ?
>
> with string_view this is a recipe for hard to find bugs.
>

Because at that time I didn't know that it was not null terminated and
I needed a way to convert the boost::string_view to std::string because
I need to make a copy of the Beast's result.

Regards

--
David
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Boost.Beast includes CRLF in HTTP fields

Boost - Users mailing list
On Sun, Jul 8, 2018 at 12:47 AM, David Demelier via Boost-users
<[hidden email]> wrote:
> I need to make a copy of the Beast's result.

What if there was a way to "detach" the field/value element from the
container so that it could outlive the message? Would that be
sufficient to allow you to avoid making a copy?

Thanks
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users