Potential Boost SAX library

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Potential Boost SAX library

Boost - Dev mailing list
Hi all,

I was wondering if a library I'm developing would be of value to the
Boost community. It is basically an event-driven parsing/serialization
library for common formats using a standard internal representation or
simple pass-through conversions. Would anyone be interested in something
like this being added to Boost?

Thanks.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
On 1/9/2018 12:36 PM, Oliver Adams via Boost wrote:
> Hi all,
>
> I was wondering if a library I'm developing would be of value to the
> Boost community. It is basically an event-driven parsing/serialization
> library for common formats using a standard internal representation or
> simple pass-through conversions. Would anyone be interested in something
> like this being added to Boost?
>
> Thanks.

"Common formats" needs to be specified. Boost already has a
serialization library so you might also want to explain how your library
is different from that. It is always a good idea to be as specific as
possible when querying about the interest in a new library.



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 09.01.2018 12:36, Oliver Adams via Boost wrote:
> Hi all,
>
> I was wondering if a library I'm developing would be of value to the
> Boost community. It is basically an event-driven parsing/serialization
> library for common formats using a standard internal representation or
> simple pass-through conversions. Would anyone be interested in
> something like this being added to Boost?

As you are mentioning SAX, do you have XML in mind ? I once developed a
C++ wrapper around existing XML C APIs (both, DOM and SAX), with the
intent of eventually submitting it to Boost, but never found the energy
to finish this. (I started a couple of discussions on this list, and do
have some working code here: https://github.com/stefanseefeld/boost.xml)
How does your library compare to that ? I'm not sure there would still
be any interest in XML APIs in Boost at this point in time. But if so, I
believe it's best to start with something like that rather than invent
yet another "XML-like" tool.

Best,

Stefan

--

      ...ich hab' noch einen Koffer in Berlin...
   


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
The "Common formats" include simple representations like JSON, XML, or
CSV, but also include database writing/retrieval formats. For example,
one "format" interfaces with MySQL databases. The premise behind the
library is not just data conversions, but basically an ETL
implementation that provides analysis filters on the data as well.

If the internal representation is not used, there is practically no
limit to the size of data being converted or analyzed. This is a major
difference from Boost.Serialization.

On 01/09/2018 12:45 PM, Edward Diener via Boost wrote:

> On 1/9/2018 12:36 PM, Oliver Adams via Boost wrote:
>> Hi all,
>>
>> I was wondering if a library I'm developing would be of value to the
>> Boost community. It is basically an event-driven
>> parsing/serialization library for common formats using a standard
>> internal representation or simple pass-through conversions. Would
>> anyone be interested in something like this being added to Boost?
>>
>> Thanks.
>
> "Common formats" needs to be specified. Boost already has a
> serialization library so you might also want to explain how your
> library is different from that. It is always a good idea to be as
> specific as possible when querying about the interest in a new library.
>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Stefan,

 > As you are mentioning SAX, do you have XML in mind ?

The library supports a version of XML (at least for writing, right now),
but is not limited to that.

 > (I started a couple of discussions on this list, and do
 > have some working code here: https://github.com/stefanseefeld/boost.xml)
 > How does your library compare to that ?

The goals of the library are to be fast, generic, and easy to use.
The focus of the library is on data, not on arbitrary structure, so
it just supports a hierarchy or list of values, no graphs right now.
You can check it out here: https://github.com/owacoder/cppdatalib.
The readme referenced in the link provides a bit more of an overview
of what the library is like.

On 01/09/2018 01:18 PM, Stefan Seefeld via Boost wrote:

> On 09.01.2018 12:36, Oliver Adams via Boost wrote:
>> Hi all,
>>
>> I was wondering if a library I'm developing would be of value to the
>> Boost community. It is basically an event-driven parsing/serialization
>> library for common formats using a standard internal representation or
>> simple pass-through conversions. Would anyone be interested in
>> something like this being added to Boost?
> As you are mentioning SAX, do you have XML in mind ? I once developed a
> C++ wrapper around existing XML C APIs (both, DOM and SAX), with the
> intent of eventually submitting it to Boost, but never found the energy
> to finish this. (I started a couple of discussions on this list, and do
> have some working code here: https://github.com/stefanseefeld/boost.xml)
> How does your library compare to that ? I'm not sure there would still
> be any interest in XML APIs in Boost at this point in time. But if so, I
> believe it's best to start with something like that rather than invent
> yet another "XML-like" tool.
>
> Best,
>
> Stefan
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
2018-01-09 15:18 GMT-03:00 Stefan Seefeld via Boost <[hidden email]>:

> I'm not sure there would still
> be any interest in XML APIs in Boost at this point in time.


I'd like to see a XML library in Boost. I'd love to see anyone pursuing
this effort.


--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
On 09.01.2018 20:14, Vinícius dos Santos Oliveira wrote:
> 2018-01-09 15:18 GMT-03:00 Stefan Seefeld via Boost
> <[hidden email] <mailto:[hidden email]>>:
>
>     I'm not sure there would still
>     be any interest in XML APIs in Boost at this point in time.
>
>
> I'd like to see a XML library in Boost. I'd love to see anyone
> pursuing this effort.
Then have a look at https://github.com/stefanseefeld/boost.xml. It's an
API, with existing bindings to libxml2, and the original intent to add
support for other backends (e.g. xerces).
If there is enough interest I'd be happy to revive / continue the
project. (Please submit issues and feature requests to that project, so
we can discuss more technical details there.)

Stefan

--

      ...ich hab' noch einen Koffer in Berlin...
   


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 01/09/18 18:36, Oliver Adams via Boost wrote:

> I was wondering if a library I'm developing would be of value to the
> Boost community. It is basically an event-driven parsing/serialization
> library for common formats using a standard internal representation or
> simple pass-through conversions. Would anyone be interested in something
> like this being added to Boost?

There are two kinds of incremental parsers: push parsers (SAX) and pull
parsers (approximately StAX.) Briefly put, push parsers traverses the
input automatically and generates events for each token it finds,
whereas pull parsers traverses the input manually like an iterator
and the current token can be queried.

Pull parsers have some significant advantages over push parser:

   * It is straight-forward to implement a push parser on top of a pull
     parser. This involves a loop and a switch statement (see [1] for a
     complete example.) Going in the other direction involves the use of
     coroutines; most likely stateful coroutines.

   * Contextual parsing can be done directly, unlike push parsers where
     you have to maintain contextual state in the event handler.

   * Push parsers can be used directly in Boost.Serialization archives.

   * Pull parsers are composable. For instance, you could insert a URL
     pull parser directly into an HTTP pull parser.

For a pull parser framework see:

   https://github.com/breese/trial.protocol

The documentation is a bit old though.

[1]
http://breese.github.io/trial/protocol/trial_protocol/json/tutorial/push_parser.html

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
> There are two kinds of incremental parsers: push parsers (SAX) and pull
> parsers (approximately StAX.) Briefly put, push parsers traverses the
> input automatically and generates events for each token it finds,
> whereas pull parsers traverses the input manually like an iterator
> and the current token can be queried.

My library is kind of a push-pull framework. You can request the parser to
parse one event (one event is considered the smallest parse the input
format is capable of) and the parser then pushes the result to the output
handler as one or more writes. Trouble is, where the parser stops parsing
is format-dependent. This kind of limits the pull framework to just
"event-loop" style parsing right now.

> Pull parsers have some significant advantages over push parser:

>  * It is straight-forward to implement a push parser on top of a pull
>    parser. This involves a loop and a switch statement (see [1] for a
>    complete example.) Going in the other direction involves the use of
>    coroutines; most likely stateful coroutines.

Most of these features are not currently available in cppdatalib because
individual tokens are not accessible as a pull parser. If I refactored a
few things, I might be able to get a full pull parser framework.

>  * Contextual parsing can be done directly, unlike push parsers where
>    you have to maintain contextual state in the event handler.

Right now, contextual parsing is implemented in a base class of the output
handler, so it's still isolated from the end user. Kind of hackish, though,
since the parser queries the output handler for the structure of the data
it's already read.

>  * Push parsers can be used directly in Boost.Serialization archives.

>  * Pull parsers are composable. For instance, you could insert a URL
>    pull parser directly into an HTTP pull parser.

Composability is a big issue with push parsers, so removing obstacles to
that would greatly simplify some things. For certain types of information,
though, it doesn't seem like composition is important.

On Jan 13, 2018 5:05 AM, "Bjorn Reese via Boost" <[hidden email]>
wrote:

On 01/09/18 18:36, Oliver Adams via Boost wrote:

I was wondering if a library I'm developing would be of value to the Boost
> community. It is basically an event-driven parsing/serialization library
> for common formats using a standard internal representation or simple
> pass-through conversions. Would anyone be interested in something like this
> being added to Boost?
>

There are two kinds of incremental parsers: push parsers (SAX) and pull
parsers (approximately StAX.) Briefly put, push parsers traverses the
input automatically and generates events for each token it finds,
whereas pull parsers traverses the input manually like an iterator
and the current token can be queried.

Pull parsers have some significant advantages over push parser:

  * It is straight-forward to implement a push parser on top of a pull
    parser. This involves a loop and a switch statement (see [1] for a
    complete example.) Going in the other direction involves the use of
    coroutines; most likely stateful coroutines.

  * Contextual parsing can be done directly, unlike push parsers where
    you have to maintain contextual state in the event handler.

  * Push parsers can be used directly in Boost.Serialization archives.

  * Pull parsers are composable. For instance, you could insert a URL
    pull parser directly into an HTTP pull parser.

For a pull parser framework see:

  https://github.com/breese/trial.protocol

The documentation is a bit old though.

[1] http://breese.github.io/trial/protocol/trial_protocol/json/t
utorial/push_parser.html


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman
/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Potential Boost SAX library

Boost - Dev mailing list
2018-01-13 10:00 GMT-03:00 Oliver Adams via Boost <[hidden email]>:

> >  * Pull parsers are composable. For instance, you could insert a URL
> >    pull parser directly into an HTTP pull parser.
>
> Composability is a big issue with push parsers, so removing obstacles to
> that would greatly simplify some things. For certain types of information,
> though, it doesn't seem like composition is important.
>

Composability is not always important. I've written a HTTP pull parser[1],
but you won't always use the power to compose abstractions.

However, it's really impressive what you can do with pull parsers. For
instance, HTTP is the type of format where you parse incomplete messages.
The idea of reparsing from the beginning is not really feasible because you
usually won't maintain past data. Given these conditions, look at how
powerful an HTTP pull parser is, where you can copy the parser and use it
to "look ahead" (or backtracking if you will). I've wrote an example where
I compose the parser and you can kind of assume field names and field
values are always present together in the stream:
https://github.com/BoostGSoC14/boost.http/commit/9908fe06d4b2364ce18ea9b4162640b38013c699#diff-3b6fb5abc4fc10fc1535584f183e20fdR1358

I don't really care about this specific example. I just like to notice how
powerful this style really is. If I am to advertise for one style or
another (and we're talking about HTTP parsers), I'll emphasize other
characteristics.

[1] https://vinipsmaker.github.io/asiohttpserver/

--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost