URL library?

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

URL library?

Boost - Dev mailing list
Is there any interest in a URL library for Boost? This is something
that has been requested for a while now, and I've finally gotten
around to it.

Key features:

* Construct a read-only url::view from a string_view
* Construct a modifiable url::value from a string_view
  - Mutate the parts (e.g. set_scheme)

  - Set encoded or decoded strings:
    url::value u;
    u.set_username("Fr ed");
    u.set_encoded_password("pass%20word");

  - Retrieve encoded or decoded strings:
    u.username(); // returns decoded std::string
    u.encoded_password(); // returns encoded string_view

For servers, execution paths are provided to avoid all dynamic
allocation. For example to retrieve the decoded username:
    url::static_pool<4000> sp;
    std::cout << u.username( sp.allocator() );

The std::basic_string returned by username() uses the specified
allocator. A server can handle URLs without allocating any memory.

There's some punycode conversion routines but I haven't figured out if
they should be part of the library, or how they would manifest as APIs
(for international domain names).

You can perform calculations with URLs using an Allocator (default to
std::allocator<char>), or you can use a container with "static
storage" (e.g. fixed_string):

    url::static_value<4000> u; // 4000 char capacity

The library is here:

<https://github.com/vinniefalco/url>

This is still a work in progress, and I'm open to feedback that might
help me make better remaining design choices.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On Tue, 21 Jan 2020 at 04:44, Vinnie Falco via Boost
<[hidden email]> wrote:
>
> Is there any interest in a URL library for Boost?

Yes, I'm very much interested.

> There's some punycode conversion routines but I haven't figured out if
> they should be part of the library, or how they would manifest as APIs
> (for international domain names).
> [...]
> The library is here:
>
> <https://github.com/vinniefalco/url>
>
> This is still a work in progress, and I'm open to feedback that might
> help me make better remaining design choices.

There is no reference to any of the URI/URL RFCs in the code
or any (documentation) files. Is this deliberate?
What's the status of conformance?

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
Yes, interested

As a matter of fact: I have a URL class for http/https specifically,  
could act as a starting point.

ty, best
Greg
Quoting Mateusz Loskot via Boost <[hidden email]>:

> On Tue, 21 Jan 2020 at 04:44, Vinnie Falco via Boost
> <[hidden email]> wrote:
>>
>> Is there any interest in a URL library for Boost?
>
> Yes, I'm very much interested.
>
>> There's some punycode conversion routines but I haven't figured out if
>> they should be part of the library, or how they would manifest as APIs
>> (for international domain names).
>> [...]
>> The library is here:
>>
>> <https://github.com/vinniefalco/url>
>>
>> This is still a work in progress, and I'm open to feedback that might
>> help me make better remaining design choices.
>
> There is no reference to any of the URI/URL RFCs in the code
> or any (documentation) files. Is this deliberate?
> What's the status of conformance?
>
> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net
>
> _______________________________________________
> Unsubscribe & other changes:  
> http://lists.boost.org/mailman/listinfo.cgi/boost



Tell me, and I forget. Ask me, and I discover...



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?YUP

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
I have a URL class, which could act like a starter, ...it is someplace...
cheers
Greg


Quoting Mateusz Loskot via Boost <[hidden email]>:

> On Tue, 21 Jan 2020 at 04:44, Vinnie Falco via Boost
> <[hidden email]> wrote:
>>
>> Is there any interest in a URL library for Boost?
>
> Yes, I'm very much interested.
>
>> There's some punycode conversion routines but I haven't figured out if
>> they should be part of the library, or how they would manifest as APIs
>> (for international domain names).
>> [...]
>> The library is here:
>>
>> <https://github.com/vinniefalco/url>
>>
>> This is still a work in progress, and I'm open to feedback that might
>> help me make better remaining design choices.
>
> There is no reference to any of the URI/URL RFCs in the code
> or any (documentation) files. Is this deliberate?
> What's the status of conformance?
>
> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net
>
> _______________________________________________
> Unsubscribe & other changes:  
> http://lists.boost.org/mailman/listinfo.cgi/boost



Tell me, and I forget. Ask me, and I discover...



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Jan 21, 2020 at 4:44 AM Vinnie Falco via Boost
<[hidden email]> wrote:
> Is there any interest in a URL library for Boost?

Yes, interested as well. I typically rely on QUrl (which brings in
QtCore) or WebSocketPP's url,
but I'd prefer a nice one from you and Boost Vinnie.

I had a quick look, and the first thing that jumps to my mind though
is the shear number of files,
in the repo, and even just the source code, for what is a small
library. Do schema and host_type
need their own headers, and sometimes impl/.hpp, .ipp ???

I've known people/orgs with rules like 1-class-1-file, which I find
overly granular.

I've a big fan of "amalgamated" libraries, especially those which are
header-only,
where you can drop just 1 or 2 or 3 files into your project, and build
them as source
with your own code. Lowers the barrier to try something tremendously.

With Boost, the hurdles are high enough, I don't even try before my
org updates the full 3rd party,
every 2 or 3 years...

I'm probably extreme, in doing the opposite of 1-class-1-file, with a
pair of .h/.cpp files that are
more equivalent to an entire library (worse offender is 2K .h, and 14K
.cpp), but it seems
to me that the proposed Boost.URL has an awful lot of source files,
"just" for URL parsing.

I'd have a .h/.hpp/.ipp only myself :).
.h for decls and inlines only with minimum header deps,
.hpp for template stuff with additional includes,
.ipp/.cpp for non-tempate non-inline impls.

but I'm know I'm far from mainstream here :). ---DD

PS: Also saw some references to Boost.Beast in passing.
PPS: Is the allocator support similar to your proposed Boost.JSON?
Could that be an independent component.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2020-01-21 06:43, Vinnie Falco via Boost wrote:
> Is there any interest in a URL library for Boost? This is something
> that has been requested for a while now, and I've finally gotten
> around to it.

I'd be more interested in a more generic URI library. Along with a few
associated algorithms, e.g. those described in:

https://tools.ietf.org/html/rfc3986

> Key features:
>
> * Construct a read-only url::view from a string_view
> * Construct a modifiable url::value from a string_view

Why not uri and uri_view.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Jan 20, 2020 at 11:13 PM Mateusz Loskot via Boost
<[hidden email]> wrote:
> There is no reference to any of the URI/URL RFCs in the code
> or any (documentation) files. Is this deliberate?

Thanks for the feedback! Yes there is a link here:

<https://github.com/vinniefalco/url/blob/develop/include/boost/url/detail/parse.hpp#L21>

> What's the status of conformance?

The target is rfc3986 compliance. I believe it is there (modulo bugs).
These tests all pass:

<https://github.com/vinniefalco/url/blob/cfd09ee8925d596b201fc0502d6bf6a407fb3b27/test/value.cpp>

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Jan 21, 2020 at 2:03 AM Dominique Devienne via Boost
<[hidden email]> wrote:
> I had a quick look, and the first thing that jumps to my mind though
> is the shear number of files, in the repo, and even just the source code,
> for what is a small library. Do schema and host_type
> need their own headers, and sometimes impl/.hpp, .ipp ???

Yes, everything is organized that way for specific reasons. Although
the final version of the library may have a slightly different set of
files. For example, I might just get rid of scheme.hpp and everything
in it.

> it seems to me that the proposed Boost.URL has an awful
> lot of source files, "just" for URL parsing.

It isn't "just" URL parsing, it is also encoding and decoding
algorithms, custom storage and allocation, and modification of the
URL.

> PS: Also saw some references to Boost.Beast in passing.
> PPS: Is the allocator support similar to your proposed Boost.JSON?
> Could that be an independent component.

Boost.JSON has its own special allocator model because of the
hierarchical nature of the JSON container. Since a boost::url::value
is effectively just a string, the allocator model in this new library
is much simpler. A derived class uses the already familiar Allocator
parameter.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev via Boost
<[hidden email]> wrote:
> I'd be more interested in a more generic URI library.
> Along with a few associated algorithms, e.g. those described in:
> https://tools.ietf.org/html/rfc3986

Yes, this library does that. I do not use the term "URI" because it is
confusing and pointless. They are all URLs now. My library follows the
RFC, except that I have renamed the top level production rules to
reflect this preference:

   URL           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
   URL-reference = URL / relative-ref
   absolute-URL  = scheme ":" hier-part [ "?" query ]

I didn't invent this idea, deprecating the word "URI" and using "URL"
consistently in its place is recommended by WhatWG.

> Why not uri and uri_view.

First, I don't use the term "uri" ever. But i think you're asking, why
not "url" and "url_view?" Because `url::url` and `url::url_view` look
bad, they repeat a word. Thus we have `url::view` and `url::value`,
which are sensible.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On 2020-01-21 18:51, Vinnie Falco wrote:

> On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev via Boost
> <[hidden email]> wrote:
>> I'd be more interested in a more generic URI library.
>> Along with a few associated algorithms, e.g. those described in:
>> https://tools.ietf.org/html/rfc3986
>
> Yes, this library does that. I do not use the term "URI" because it is
> confusing and pointless. They are all URLs now. My library follows the
> RFC, except that I have renamed the top level production rules to
> reflect this preference:
>
>     URL           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
>     URL-reference = URL / relative-ref
>     absolute-URL  = scheme ":" hier-part [ "?" query ]
>
> I didn't invent this idea, deprecating the word "URI" and using "URL"
> consistently in its place is recommended by WhatWG.

There is a semantic difference between URI and URL - the former is an
identifier and the latter is a locator (i.e. a path to a resource
location). You can treat locator as an identifier but not the other way
around. Using the term URL to refer to an URI is confusing.

The reason I'm interested particularly in URIs is because I have to deal
with them, not so much with URLs.

>> Why not uri and uri_view.
>
> First, I don't use the term "uri" ever. But i think you're asking, why
> not "url" and "url_view?" Because `url::url` and `url::url_view` look
> bad, they repeat a word. Thus we have `url::view` and `url::value`,
> which are sensible.

Well, no, not really. I know 'using namespace abc;' is not something
universally welcome, but its is a valid use case nonetheless. After that
having `view` and `value` is no longer sensible.

I would still prefer `boost::uris::uri` and `boost::uris::uri_view`.
Note that the namespace is plural.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On 2020-01-21 21:39, Andrey Semashev wrote:

> On 2020-01-21 18:51, Vinnie Falco wrote:
>> On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev via Boost
>> <[hidden email]> wrote:
>>> I'd be more interested in a more generic URI library.
>>> Along with a few associated algorithms, e.g. those described in:
>>> https://tools.ietf.org/html/rfc3986
>>
>> Yes, this library does that. I do not use the term "URI" because it is
>> confusing and pointless. They are all URLs now. My library follows the
>> RFC, except that I have renamed the top level production rules to
>> reflect this preference:
>>
>>     URL           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
>>     URL-reference = URL / relative-ref
>>     absolute-URL  = scheme ":" hier-part [ "?" query ]
>>
>> I didn't invent this idea, deprecating the word "URI" and using "URL"
>> consistently in its place is recommended by WhatWG.
>
> There is a semantic difference between URI and URL - the former is an
> identifier and the latter is a locator (i.e. a path to a resource
> location). You can treat locator as an identifier but not the other way
> around. Using the term URL to refer to an URI is confusing.
>
> The reason I'm interested particularly in URIs is because I have to deal
> with them, not so much with URLs.

Also, I'll add that WhatWG is a web-related working group, and URIs are
used in many other areas. In my case it's telephony and media processing.

>>> Why not uri and uri_view.
>>
>> First, I don't use the term "uri" ever. But i think you're asking, why
>> not "url" and "url_view?" Because `url::url` and `url::url_view` look
>> bad, they repeat a word. Thus we have `url::view` and `url::value`,
>> which are sensible.
>
> Well, no, not really. I know 'using namespace abc;' is not something
> universally welcome, but its is a valid use case nonetheless. After that
> having `view` and `value` is no longer sensible.
>
> I would still prefer `boost::uris::uri` and `boost::uris::uri_view`.
> Note that the namespace is plural.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Jan 21, 2020 at 10:41 AM Andrey Semashev via Boost
<[hidden email]> wrote:
> There is a semantic difference between URI and URL - the former is an
> identifier and the latter is a locator (i.e. a path to a resource
> location). You can treat locator as an identifier but not the other way
> around. Using the term URL to refer to an URI is confusing.

Having both terms is confusing, and WhatWG got this right. The vast
majority of users just want to "parse a URL", for example one that
comes in from an HTTP request, or one that is specified on the command
line. When they go into Google, they type "URL" they don't type "URI."
Hardly anyone knows what a URI is. But even my mother who is 90 knows
what a URL is.

I want my libraries to be popular and have mass appeal, not just
satisfy a niche audience of super-experts. When I type "URI" into
Google I get:

    About 287,000,000 results (0.87 seconds)
    www.uri.edu
    The University of Rhode Island (top result)

    People Also Ask:
    What is difference URL and URI?
    While they are used interchangeably, there are some subtle differences...

Now if I type "URL" into Google, I get:

    About 12,620,000,000 results (0.50 seconds)
    en.wikipedia.org › wiki › URL
    URL - Wikipedia (top result)

    People Also Ask:
     What is the URL?
    What is an example of a URL address?
    How do I find URL?
    What is the path in the URL?
    What is URL on my phone?
    What does WWW stand for?

Yes, not only is "URL" 44 times more popular than "URI" in terms of
search results, but the top question about "URI" is "What is
difference URL and URI?". While for "URL" no one is asking about the
difference.

Another way to think of it, in terms of name recognition "URI" is to
.org what "URL" is to .com. People assume that a domain name is in
.com because that's the most popular TLD. That's why .com domains go
for so much more money.

It is true that URL is not an exact fit if you adhere to the technical
documentation 100%, but I think the overall benefit of just
standardizing on the name "URL" outweighs the downsides. It is easier
for users, better for Boost, and gives the library more appeal to
average folk.

> The reason I'm interested particularly in URIs is because I have to deal
> with them, not so much with URLs.

This library should do everything you want with URIs since I take care
of parsing all the top-level rules. The library does not make
assumptions about the data. For example if you want to treat the path
as just one string and ignore the segments, you can do that. If you
want to ignore the distinction between username and password in the
userinfo, you can do that too. You can treat the query params as an
associative array of key/value pairs if you want, or you can ignore
that and just work with the query directly.

If you have specific use-cases feel free to open an issue or cite them
here and I will make sure they are attended to (assuming it is
in-scope).

Thanks

P.S. "Only snobs call it a URI" :)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On 2020-01-21 22:59, Vinnie Falco wrote:

> On Tue, Jan 21, 2020 at 10:41 AM Andrey Semashev via Boost
> <[hidden email]> wrote:
>> There is a semantic difference between URI and URL - the former is an
>> identifier and the latter is a locator (i.e. a path to a resource
>> location). You can treat locator as an identifier but not the other way
>> around. Using the term URL to refer to an URI is confusing.
>
> Having both terms is confusing, and WhatWG got this right. The vast
> majority of users just want to "parse a URL", for example one that
> comes in from an HTTP request, or one that is specified on the command
> line. When they go into Google, they type "URL" they don't type "URI."
> Hardly anyone knows what a URI is. But even my mother who is 90 knows
> what a URL is.
>
> I want my libraries to be popular and have mass appeal, not just
> satisfy a niche audience of super-experts. When I type "URI" into
> Google I get:
>
>      About 287,000,000 results (0.87 seconds)
>      www.uri.edu
>      The University of Rhode Island (top result)
>
>      People Also Ask:
>      What is difference URL and URI?
>      While they are used interchangeably, there are some subtle differences...
>
> Now if I type "URL" into Google, I get:
>
>      About 12,620,000,000 results (0.50 seconds)
>      en.wikipedia.org › wiki › URL
>      URL - Wikipedia (top result)
>
>      People Also Ask:
>       What is the URL?
>      What is an example of a URL address?
>      How do I find URL?
>      What is the path in the URL?
>      What is URL on my phone?
>      What does WWW stand for?
>
> Yes, not only is "URL" 44 times more popular than "URI" in terms of
> search results, but the top question about "URI" is "What is
> difference URL and URI?". While for "URL" no one is asking about the
> difference.

You get more exposure of the URL term because there are much more people
using web for various reasons than e.g. SIP or email or SDP. For web,
sure, there's the URL bar in your browser and HTTP headers and that's
pretty much it. Given this, I can understand WhatWG's decision to
standardize URLs *in their specific domain*. That doesn't make that
choice valid in other domains. Search through SIP RFC and you will find
the correct term is URI there. If your library targets those other
domains, you should speak their language, too.

Sorry, but I can't call e.g. an email address an URL, and I don't agree
with proliferation of such confusion. It's MB vs. MiB all over again.

> Another way to think of it, in terms of name recognition "URI" is to
> .org what "URL" is to .com. People assume that a domain name is in
> .com because that's the most popular TLD. That's why .com domains go
> for so much more money.
>
> It is true that URL is not an exact fit if you adhere to the technical
> documentation 100%, but I think the overall benefit of just
> standardizing on the name "URL" outweighs the downsides. It is easier
> for users, better for Boost, and gives the library more appeal to
> average folk.

Well, let's agree to disagree then.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 22/01/2020 07:39, Andrey Semashev wrote:

> On 2020-01-21 18:51, Vinnie Falco wrote:
>> On Tue, Jan 21, 2020 at 2:13 AM Andrey Semashev wrote:
>>> I'd be more interested in a more generic URI library.
>>> Along with a few associated algorithms, e.g. those described in:
>>> https://tools.ietf.org/html/rfc3986
>>
>> Yes, this library does that. I do not use the term "URI" because it is
>> confusing and pointless. They are all URLs now. My library follows the
>> RFC, except that I have renamed the top level production rules to
>> reflect this preference:
>>
>>     URL           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
>>     URL-reference = URL / relative-ref
>>     absolute-URL  = scheme ":" hier-part [ "?" query ]
>>
>> I didn't invent this idea, deprecating the word "URI" and using "URL"
>> consistently in its place is recommended by WhatWG.
>
> There is a semantic difference between URI and URL - the former is an
> identifier and the latter is a locator (i.e. a path to a resource
> location). You can treat locator as an identifier but not the other way
> around. Using the term URL to refer to an URI is confusing.

Notably, all URLs are URIs, but not all URIs are URLs.  Some are URNs,
for example, which are structured a bit differently (eg.
"urn:oasis:names:specification:docbook:dtd:xml:4.1.2").

A program only dealing with "locations to download from" generally only
needs to worry about URLs, but there are other places where all URIs
(including URNs) may be encountered (even by such a program) -- for
example, as XML namespace identifiers.  (Usually these can be treated as
opaque, though.)

Still, given that the same parsing rules can apply to both (URNs usually
just have a long opaque path after the "urn" scheme), it doesn't seem
unreasonable to call it an "URL library" anyway (despite the
recommendation in RFC3986).  Some people would be confused by calling
them "URIs" and those who know better will know that as well.  Having
said that, the docs should call out RFC support and URI compatibility
explicitly, so that people aren't left wondering.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On Tue, Jan 21, 2020 at 5:10 PM Gavin Lambert via Boost
<[hidden email]> wrote:
> ...Some are URNs,...

LOL!! I was hoping to reduce it to one term but instead now we have three..

Fortunately URN has the same syntax, it is just a custom scheme. The
way I deal with that is that the user can parse the urn as a URL,
check the scheme, and then apply the scheme-specific syntax rules for
subdelimiters to the individual parts.

> Some people would be confused by calling
> them "URIs" and those who know better will know that as well.  Having
> said that, the docs should call out RFC support and URI compatibility
> explicitly, so that people aren't left wondering.

Yes I agree, I added that to the list of tasks.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
It's also worth mentioning, there are alternative URL parser
implementations available.

For example, here's Furi:
https://github.com/LeonineKing1199/furi

It's in essence of a port of the URI ABNF written in Boost.Spirit, more
specifically X3. There's also routines for percent encoding and decoding.

Instead of the proposed Boost.URL, this lib aims to be low-level but
composable. Because the entire ABNF set is exported, one could
theoretically re-compose a parser that'd handle any scenario.

The emphasis is on immutability and functional style of programming. The
main structure that users will interact with is really just a POD of
`std::string_view`s and parser combinators themselves are also very
FP-oriented.

Less than desirable aspects of the lib are that it only does Unicode in
UTF-32 but does give you easy methods of converting to it. This was done
for the sake of simplicity and also because that's how X3 does it.
Fortunately, most URLs are relatively small in practice so the storage
overhead is affordable in most scenarios.

The best way of verifying the parser are the various uri and uri_parts
tests. If you can think of a URL that'd break it, I'd love to try it! If
it's ABNF-correct, Furi will recognize it too!

- Chris

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 21/01/2020 16:43, Vinnie Falco wrote:
> Is there any interest in a URL library for Boost? This is something
> that has been requested for a while now, and I've finally gotten
> around to it.

I'm quite interested.

Though some docs would be nice. ;)

> For servers, execution paths are provided to avoid all dynamic
> allocation. For example to retrieve the decoded username:
>      url::static_pool<4000> sp;
>      std::cout << u.username( sp.allocator() );

Repeated reinventing of static allocators gives me some pause.  Maybe
that should be broken out into a separate library first?

And maybe recently-accepted FixedString could use it too (or you could
use theirs)?

> The library is here:
>
> <https://github.com/vinniefalco/url>

Glancing at
https://github.com/vinniefalco/url/blob/develop/include/boost/url/impl/basic_value.ipp,
it looks like there's quite a bit of duplicate code (eg. between
set_password and set_encoded_password).

I assume this is related to the desire to avoid allocation, but perhaps
you could make use of your own static_pool when delegating common
subtasks, rather than duplicating the logic?

(Side note: I find the "wrap at <40 columns" style harder to read.  Who
has screens that narrow these days?)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On Tue, Jan 21, 2020 at 7:37 PM Gavin Lambert via Boost
<[hidden email]> wrote:
> Though some docs would be nice. ;)

Heh... working on that. And per the Ramey Rule, the doc work has
surfaced defects in the API which I am fixing.  This page has the most
work:

<http://vinniefalco.github.io/doc/url/url/ref/boost__url__basic_value.html>

Still being worked on of course.

> Repeated reinventing of static allocators gives me some pause.  Maybe
> that should be broken out into a separate library first?

Well this is not such an easy thing. One of the goals for all my
libraries is that they can work outside of boost (just define
BOOST_URL_STANDALONE). I could break out this little allocator into
another library, but I doubt it is enough to justify a whole entire
lib. Is there another already existing allocator that does the same
thing? I'm not sure there is.

But even so, users who just need to parse, modify, and compose URLs in
their server, and wish to avoid memory allocations will be glad that
they have a 170-line solution in a single header available to them
without the need to look elsewhere.

> And maybe recently-accepted FixedString could use it too (or you could use theirs)?

FixedString doesn't use any allocator. The reason I use the Allocator
model here (versus my home-brewed "storage_ptr" in Boost.JSON) is
because I want to return std::basic_string from the relevant
functions.

> Glancing at
> https://github.com/vinniefalco/url/blob/develop/include/boost/url/impl/basic_value.ipp,
> it looks like there's quite a bit of duplicate code (eg. between
> set_password and set_encoded_password).
>
> I assume this is related to the desire to avoid allocation, but perhaps
> you could make use of your own static_pool when delegating common
> subtasks, rather than duplicating the logic?

I think what you're proposing is that set_password() can first
percent-encode the string using a local pool, and then pass that to
set_encoded_password(). This will certainly eliminate the duplicated
code. But then we are either placing a limit on the size of the string
that may be passed, or we have the possibility of going to the heap
one extra time (to handle the case where the resulting string is
larger than the static_pool's capacity).

I think I would just rather live with the duplicated code. Although,
if you look closely it isn't _really_ duplicated, there are subtle
variations in it which admittedly are rather resistant to refactoring
although I haven't tried very hard. Open to ideas how it can be
reduced, without the need to allocate.

> (Side note: I find the "wrap at <40 columns" style harder to read.  Who
> has screens that narrow these days?)

No idea
what
you're
going on
about
here.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, 21 Jan 2020, 16:37 Vinnie Falco, <[hidden email]> wrote:

> On Mon, Jan 20, 2020 at 11:13 PM Mateusz Loskot via Boost
> <[hidden email]> wrote:
> > There is no reference to any of the URI/URL RFCs in the code
> > or any (documentation) files. Is this deliberate?
>
> Thanks for the feedback! Yes there is a link here:
>
> <
> https://github.com/vinniefalco/url/blob/develop/include/boost/url/detail/parse.hpp#L21
> >
>


Thanks!
GitHub seemed to fail find this for me.


> What's the status of conformance?
>
> The target is rfc3986 compliance.


Sweet!

Mateusz Loskot, [hidden email]
(Sent from mobile, may suffer from top-posting)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: URL library?

Boost - Dev mailing list
On Tue, Jan 21, 2020 at 8:26 PM Mateusz Loskot via Boost
<[hidden email]> wrote:
> GitHub seemed to fail find this for me.
> ...
> > What's the status of conformance?

Yes, to clarify, I have floated this library a little bit earlier in
its development cycle than my other libraries. This is because I have
some open design questions such as how to handle punycode, and what do
to with percent-encoding with respect to Unicode.

Thus the library and documentation is not quite as well-developed as
my other offerings. Although since it is a much smaller library, I'll
have it whipped into shape in short order (working on the docs now).

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12