Idea Suggestion for GsOC'21

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Idea Suggestion for GsOC'21

Boost - Dev mailing list
Hey! This is Sanyam Bhaskar.

I read over the default Ideas provided to us and XML Parser really caught
my eye. I would like to contribute to the same but don’t know where to
start.

Also, as per my understanding, XML is relatively outdated , when compared
to data languages like JSON. So in addition to this being an XML Parser, I
think adding a JSON parser alongside it would boost the library’s utility
in the modern day industry.



I Would appreciate it if someone could tell me how to get started and some
feedback on the suggestion. Looking forward to contributing to the project.



Yours,

Sanyam Bhaskar

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
Em qua., 10 de mar. de 2021 às 14:01, Sanyam Bhaskar via Boost
<[hidden email]> escreveu:
> Hey! This is Sanyam Bhaskar.

Hi Sanyam,

> I read over the default Ideas provided to us and XML Parser really caught
> my eye.

glad to know. I'm the potential mentor for this project.

> Also, as per my understanding, XML is relatively outdated , when compared
> to data languages like JSON. So in addition to this being an XML Parser, I
> think adding a JSON parser alongside it would boost the library’s utility
> in the modern day industry.

We held a review for a JSON library not long ago and the library got
accepted, so we already have a JSON (push) parser. I still see room
for a JSON pull parser, but I'd not be willing to spearhead this
effort, so unless someone else shows up to mentor it I don't think
we'd have such a project.

XML is an old, overengineered and hated format (and rightfully so),
but industry adoption basically forces us to use it for
interoperability with a few services to this day. So that's the value
for XML here, interoperability with legacy software. It's not a value
to be neglected.

I also think it'd be a good project for first-time students as the
basics of the format are really well-known and I believe in my skills
to gradually point the student to its quirks as the project advances.

> I Would appreciate it if someone could tell me how to get started and some
> feedback on the suggestion. Looking forward to contributing to the project.

I wrote some of the ideas that you saw in the wiki page for Boost
GSoC. I didn't know which projects would attract students, so I didn't
invest a lot of time detailing each individual project (my bad).

The programming competency test was to write a CSV parser. However you
can negotiate to write a parser for a different format if you think
it'd be more interesting to showcase your C++ skills (please choose a
simple one-afternoon-to-implement format and negotiate the alternative
target beforehand). Once you're done, send the code directly to me
(don't post it publicly) and I'll be making requests to change one
stuff here and there to see how well you manage to change the code as
well as other comments.

On top of that, you'll need to write a proposal to be submitted
through Google platform during the student application period (March
29 - April 13). If you want, you can send your proposal here (this
time you must not send it to me in private, but must post it publicly
on the list) and ask for feedback if you want. If you don't need early
feedback on your proposal, you can also decide to not post it here at
all but otherwise only send it through Google platform (then your
proposal will only be available to Boost GSoC team). I can't suggest
specific strategies, but I advise you should strive to make a good
impression. Early feedback obviously will give you "extra" time to
improve your proposal. Do keep in mind that sending your proposal to
this list is not an official submission. You always must send a final
proposal through Google platform during the student application
period.

Google will eventually announce how many student slots Boost is given
and the accepted students will be announced on May 17.


--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
Hi Vinicius,

allow me to jump into this discussion with some thoughts.

On 2021-03-10 2:16 p.m., Vinícius dos Santos Oliveira via Boost wrote:
> XML is an old, overengineered and hated format (and rightfully so),
> but industry adoption basically forces us to use it for
> interoperability with a few services to this day. So that's the value
> for XML here, interoperability with legacy software. It's not a value
> to be neglected.
>
> I also think it'd be a good project for first-time students as the
> basics of the format are really well-known and I believe in my skills
> to gradually point the student to its quirks as the project advances.

I'll give a very similar advice I shared with FFT proposals: Please
consider not to re-implement a full XML library (which is quite a
daunting task), but rather, focus on the C++ API as an *interface* that
can be layered on top of existing XML libraries.

The world already has way too many incomplete and buggy XML libraries.
Please let's not make it worse.

The approach I had taken (admittedly many years ago) consists in
defining a C++ API around one of the more popular (and efficient)
implementations at the time: libxml2 (http://www.xmlsoft.org/), with
support for a DOM-like API as well as a SAX-like streaming API.

Of particular importance is that a fully functional XML API needs to
have some support for Unicode, which is sadly still quite difficult to
do in C++. My choice was to parametrize the entire API around the
character type, letting users pick their own Unicode bindings (a simple
trait-like class would be enough to bind to alternative types there).

Anyhow, my code is still online, if anyone wants to have a look:
https://github.com/stefanseefeld/boost.xml

While the libxml2 bindings work very nicely (including xpath support and
some other nice features), I never felt comfortable proposing my work in
its current form for adoption into Boost without having added at least
one other XML library backend (Xerces comes to mind), to make sure the
API itself is robust enough and doesn't accidentally leak libxml2 design
choices.

Best,

Stefan
--

       ...ich hab' noch einen Koffer in Berlin...


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
On Wed, Mar 31, 2021 at 10:11 PM Stefan Seefeld via Boost <
[hidden email]> wrote:

> allow me to jump into this discussion with some thoughts.
>
> On 2021-03-10 2:16 p.m., Vinícius dos Santos Oliveira via Boost wrote:
> > XML is an old, overengineered and hated format (and rightfully so),
> > but industry adoption basically forces us to use it for
> > interoperability with a few services to this day. So that's the value
> > for XML here, interoperability with legacy software. It's not a value
> > to be neglected.


I'll give a very similar advice I shared with FFT proposals: Please
> consider not to re-implement a full XML library (which is quite a
> daunting task), but rather, focus on the C++ API as an *interface* that
> can be layered on top of existing XML libraries.
>

While normally I'd agree with you, by this train of thought,
we wouldn't have Boost.JSON accepted in Boost right now.


> The world already has way too many incomplete and buggy XML libraries.


True. But different people have different tradeofs. libxml2 and xerces and
expat
may be complete, and as close to bug free as it gets in C/C++ XML, but they
are
certainly not modern C++, often not incremental parsing, and certainly
don't allow
the kind of allocator support Boost.JSON introduced. Nor are they the
fastest.

So a non-wrapper Boost.JSON like Boost.XML would be very interesting.
Perhaps even like Boost.JSON, and controversially, foregoing SAX and only
do DOM.

The main issue with XML are all the little things to get right, like
character entities,
entity includes inherited from DTDs, DTDs themselves, for validation and
default values,
whitespace normalization, namespace support, and related techs liks XSDs,
XPath,
XLink, XInclude, XQuery, etc... Proper PSVI (post schema validation
infoset) is also
often problematic, but that assumes a validating parser (via DTD or XSD) in
the first place.

There's definitely space to explore a Boost.JSON-like low-level modern
parser building
only a DOM with value semantic and allocator support, with a modern API.
Much could
be built on such a foundation, and that's an interesting GSOC project, even
if it never "graduates".

In any case, beside the 3 mentioned above, there's also rapidxml and
pugixml,
the latter still actively maintained. Perhaps they are not as complete, but
they
are definitely quite a bit faster than the "old" ones. --DD

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
Em qui., 1 de abr. de 2021 às 05:29, Dominique Devienne via Boost <
[hidden email]> escreveu:

> There's definitely space to explore a Boost.JSON-like low-level modern
> parser building
>

Boost.JSON is anything but low-level parsing. You definitely didn't explore
its parser. I did. Boost.JSON would definitely not be an inspiration to
Boost.XML.

Can we not hijack this thread to propaganda machinery please? Thank you.


--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Apr 1, 2021 at 10:30 AM Dominique Devienne <[hidden email]>
wrote:

> On Wed, Mar 31, 2021 at 10:11 PM Stefan Seefeld via Boost <
> [hidden email]> wrote:
>
>> consider not to re-implement a full XML library
>
>

> So a non-wrapper Boost.JSON like Boost.XML would be very interesting.
> Perhaps even like Boost.JSON, and controversially, foregoing SAX and only
> do DOM.
>

One thing I forgot to mention, is that an explicit goal of any Boost.XML
API, wrapper or not,
should be to replicate Peter's Boost.Describe "data-binding" examples to
convert JSON "values"
to described C++ structures, but in the XML space. Can your attempt at an
XML API do that Stefan?
That would be a very compelling Boost.XML IMHO. Even w/o any SAX support.
--DD

[1] https://github.com/pdimov/describe/blob/develop/example/to_json.cpp
[2] https://github.com/pdimov/describe/blob/develop/example/from_json.cpp

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

On 2021-04-01 4:30 a.m., Dominique Devienne via Boost wrote:
>
> But different people have different tradeofs. libxml2 and xerces and
> expat
> may be complete, and as close to bug free as it gets in C/C++ XML, but they
> are
> certainly not modern C++,

Stylistic questions ("modern C++") are secondary to functional correctness.

>   often not incremental parsing, and certainly
> don't allow
> the kind of allocator support Boost.JSON introduced. Nor are they the
> fastest.

libxml2 offers streaming APIs ("incremental parsing") and is among the
fastest implementations you can get.

As I said in the FFT thread: thinking that you can match such a library
(both in functionality and performance) with a GSoC project is foolish,
so it seems wiser to focus on the interface, then bind that to existing
implementations.


> The main issue with XML are all the little things to get right, like
> character entities,
> entity includes inherited from DTDs, DTDs themselves, for validation and
> default values,
> whitespace normalization, namespace support, and related techs liks XSDs,
> XPath,
> XLink, XInclude, XQuery, etc... Proper PSVI (post schema validation
> infoset) is also
> often problematic, but that assumes a validating parser (via DTD or XSD) in
> the first place.

Exactly. How are you proposing to handle all these questions above ?

> There's definitely space to explore a Boost.JSON-like low-level modern
> parser building
> only a DOM with value semantic and allocator support, with a modern API.
> Much could
> be built on such a foundation, and that's an interesting GSOC project, even
> if it never "graduates".
>
> In any case, beside the 3 mentioned above, there's also rapidxml and
> pugixml,
> the latter still actively maintained. Perhaps they are not as complete, but
> they
> are definitely quite a bit faster than the "old" ones. --DD

This is not about which XML library is better. Quite the opposite, in
fact: I want to make an argument for establishing a modern C++ API that
can be bound to any such library. We don't need more half-baked partial
XML implementations, we need a standard C++ API for XML.

Stefan
--

       ...ich hab' noch einen Koffer in Berlin...


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Idea Suggestion for GsOC'21

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

On 2021-04-01 4:43 a.m., Dominique Devienne via Boost wrote:
> One thing I forgot to mention, is that an explicit goal of any Boost.XML
> API, wrapper or not,
> should be to replicate Peter's Boost.Describe "data-binding" examples to
> convert JSON "values"
> to described C++ structures, but in the XML space. Can your attempt at an
> XML API do that Stefan?

That's an orthogonal piece of functionality, which can be implemented on
top of Boost.XML and Boost.Describe.

Stefan
--

       ...ich hab' noch einen Koffer in Berlin...


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost