[review] [json] My JSON review

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[review] [json] My JSON review

Boost - Dev mailing list
Hi,

This is my review of Boost.JSON by Vinnie Falco and Krystian Stasiowski.

I believe that the library should be REJECTED, though it would me a
much better candidate with just a few minor tweaks here and there.

The library provides the following components:
- a parser framework
- a DOM to represent/construct/edit trees of JSON-like values (json::value)
- a serializer for that type

Unfortunately, all three of these components are not quite
satisfactory when it comes to addressing the problem domain of JSON as
an interchange format.
JSON is ubiquitous technology, and I feel like this library is not
deserving of a name such as Boost.JSON, which would suggest "the right
way to do JSON as seen by the C++ community".

Parser:
--------
Parsing is arguably the most important feature of any such library.
However with Boost.JSON it really feels like it is an afterthought,
and only its coupling with json::value was considered.

These are the main issues:
- The lack of proper number support from the DOM apparently propagates
to the parser as well, even though there is no cost for the parser to
properly support them.
- The interface is closely tied to a text format and doesn't support
JSON-inspired binary formats. This kind of forward thinking is what
would really elevate those parsers.
- The fact that only a push parser is provided. Parsing into own types
usually requires a pull parser in order for nested member's parsers to
combine into the outer parser.

The way the parser interfaces handles numbers is not only inconsistent
with the rest of its interface but also limiting.
            bool on_number_part( string_view, error_code& ) { return true; }
            bool on_int64( std::int64_t, string_view, error_code& ) {
return true; }
            bool on_uint64( std::uint64_t, string_view, error_code& )
{ return true; }
            bool on_double( double, string_view, error_code& ) { return true; }

I would expect an interface that looks like
            bool on_number_part( string_view, std::size_t n,
error_code& ) { return true; }
            bool on_number( string_view, std::size_t n, error_code& )
{ return true; }
            bool on_int64( std::int64_t, error_code& ) { return true; }
            bool on_uint64( std::uint64_t, error_code& ) { return true; }
            bool on_double( double, error_code& ) { return true; }

I see the push parser nicely integrates with beast/asio segmented
buffering and incremental parsing, which is nice.

DOM:
------
The DOM provides a canonical or vocabulary type to represent any JSON
document, which is useful in itself.
The DOM does not support numbers that are neither 64-bit integral
numbers nor double-precision floating-point. This is a common
limitation, but still quite disappointing, since the proposal of it
being a canonical type is quite tainted by the fact it can't represent
JSON data losslessly. I think it would be better if another type was
provided for numbers that don't fit in the previous two categories.

It uses a region allocator model based on polymorphic value, which is
what the new polymorphic allocator model from Lakos et al. was
specifically intended for. It puts some kind of reference-counting for
life-extension semantics on top, which I'm not sure I'm a big fan of,
but that seems optional.

Querying capabilities for that type are pretty limited and quite
un-C++ like. Examples use a C-style switch/extractor paradigm instead
of a much safer variant-like visitor approach.
There is no direct nested xpath-style querying like with boost
property tree or other similar libraries.

It feels like the interface is only a marginal improvement on that of rapidjson.

Serializer:
-----------
There is no serializer framework to speak of: just a serializer for
the DOM type json::value, i.e. the only way to generate a JSON
document is to generate a json::value and have it be serialized.
A serializer interface symmetric to that of the parser could be
provided instead, allowing arbitrary types to be serialized to JSON
(or any other similar format) without conversion to json::value.

When it comes to serializing floating-point numbers, it seems it
always uses scientific notation. Most libraries have an option so that
you get to choose a digit threshold at which to switch.
Integers always use a fixed representation, which makes this all quite
inconsistent.

Review questions:
----------------------
>    - What is your evaluation of the design?
I can see the design is mostly concerned with receiving arbitrary JSON
documents over the network, incrementally building a DOM as the data
is received, and then doing the opposite on the sending path.
It is good at doing that I suppose, but however is otherwise quite
limited as it is unsuitable for any use of JSON as a serialization
format for otherwise known messages.

>    - What is your evaluation of the implementation?
I only took a quick glance, but it seemed ok. Can't say I'm a fan of
this header-only idea, but some people like it.

>    - What is your evaluation of the documentation?
It is quite lacking when it comes to usage of the parser beyond the DOM.
It also doesn't really say how numbers are formatted in the serializer.

>    - What is your evaluation of the potential usefulness of the library?
Limited, given the scope of the library and the quality of its
interface, I don't see any reason to use it instead of rapidjson,
which is an established library for which I already have converters
and Asio stream adapters.

>    - Did you try to use the library? With which compiler(s)? Did you have any problems?
I didn't.

>    - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Maybe half a day reading through the documentation and the mailing list.

>    - Are you knowledgeable about the problem domain?
I have used a lot of different JSON libraries and I have written
several serialization/deserialization frameworks using JSON, some of
which are deeply tied into everything I do everyday.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
Mathias Gaunard wrote:

> - The lack of proper number support from the DOM apparently propagates to
> the parser as well, even though there is no cost for the parser to
> properly support them.

In what way does it apparently propagate?

> The way the parser interfaces handles numbers is not only inconsistent
> with the rest of its interface but also limiting.

What are the limitations, and what is `n` in your suggested replacement?


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Sep 21, 2020 at 6:11 AM Mathias Gaunard via Boost
<[hidden email]> wrote:
> I believe that the library should be REJECTED

Thank you for taking the time to review the library.

> - The interface is closely tied to a text format and doesn't support
> JSON-inspired binary formats.

"JSON-inspired binary formats" are not JSON. CBOR for example is
"JSON-inspired", yet neither Node.js nor any web browser support it.
They do however support standard JSON, and their respective DOMs have
numerical limitations similar to the ones in Boost.JSON.

> - The fact that only a push parser is provided. Parsing into own types
> usually requires a pull parser in order for nested member's parsers to
> combine into the outer parser.

It seems to me that there are two non-overlapping use-cases:

1. Parsing and serializing standard JSON to and from a
library-provided DOM container

2. Parsing and serializing standard JSON to and from user-defined
types, with no intermediate DOM container representation.

Boost.JSON addresses number 1 above, and in my opinion it accomplishes
that goal with the level of performance and interface clarity that the
world expects of a Boost library. Number 2 above is still useful
functionality, but there is not a lot of implementation that could be
shared in a library that attempts to do both (you could probably share
the escaping of strings and output of numbers).

I do not think it is possible for a single parser API and single
serializer API to address both of these use-cases well. Optimizing for
one will necessarily come at the expense of the other. Perhaps there
is a hypothetical not-yet-written library just waiting to be
discovered which will prove me wrong, by doing everything well. I have
my doubts.

It seems to me that these are two separate libraries.

> The DOM does not support numbers that are neither 64-bit integral
> numbers nor double-precision floating-point. This is a common
> limitation, but still quite disappointing, since the proposal of it
> being a canonical type is quite tainted by the fact it can't represent
> JSON data losslessly. I think it would be better if another type was
> provided for numbers that don't fit in the previous two categories.

I disagree. Serialization of a canonical type should produce JSON that
is interoperable with the majority of receivers. Neither Node.js nor
browser implementations of JSON understand arbitrary precision
integers out of the box. Unless the receiver is configured to process
it, emitting an arbitrary precision number is certain to cause a
compatibility problem.

In fact the entire goal of RFC 7493
(https://tools.ietf.org/html/rfc7493) is to specify additional
constraints on JSON values to maximize interoperability. Boost.JSON
loosely follows these guidelines by restricting itself to 64 bit
integers and floats.

> Querying capabilities for that type are pretty limited and quite
> un-C++ like. Examples use a C-style switch/extractor paradigm instead
> of a much safer variant-like visitor approach.

visit() is a planned feature which is already implemented:

<https://github.com/vinniefalco/json/tree/visit>

> There is no direct nested xpath-style querying like with boost
> property tree or other similar libraries.

JSON Path and JSON Pointer are planned features:

<https://goessner.net/articles/JsonPath/>

<https://tools.ietf.org/html/rfc6901>

> When it comes to serializing floating-point numbers, it seems it
> always uses scientific notation. Most libraries have an option so that
> you get to choose a digit threshold at which to switch.

Yes the conversion from floating point to string is an area of
research and improvement.

> is otherwise quite
> limited as it is unsuitable for any use of JSON as a serialization
> format for otherwise known messages.

That is by design. Boost.JSON is not a serialization library. It
addresses the use case for a JSON DOM with accompanying parser and
serializer. This is not an insignificant use-case; witness the
popularity of RapidJSON and nlohmann's JSON for Modern C++. In fact
didn't you say you are already using RapidJSON?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, 21 Sep 2020 at 14:44, Peter Dimov via Boost
<[hidden email]> wrote:
>
> Mathias Gaunard wrote:
>
> > - The lack of proper number support from the DOM apparently propagates to
> > the parser as well, even though there is no cost for the parser to
> > properly support them.
>
> In what way does it apparently propagate?

Assuming you have the number 12345678909876543210, which is striding
two buffers, with 1234567890 then 9876543210, the parser will call

uint64 value = 0;

value = value * pow10(size("1234567890")) + 1234567890;
on_number_part("1234567890", ec)

value = value * pow10(size("9876543210")) + 9876543210;
on_uint64(value, "9876543210", ec)

(the actual implementation works on character per character basis and
is slower than this)
This will not really work once the value gets one digit larger, as the
first argument will be garbage.

The text to binary conversion shouldn't even be in the JSON parser,
that's an orthogonal concern that's only needed for json::value.
If you want to do that, you can just easily implement it in
json::parser, the handler in charges of building json::value (which is
currently a no-op for on_number_part).

> > The way the parser interfaces handles numbers is not only inconsistent
> > with the rest of its interface but also limiting.
>
> What are the limitations, and what is `n` in your suggested replacement?

Same as for on_string_part/on_string, on_key_part/on_key,
on_comment_part/on_comment etc.
I'm arguing it should be there for consistency, it's not strictly necessary.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
> Assuming you have the number 12345678909876543210, which is striding two
> buffers, with 1234567890 then 9876543210, the parser will call
>
> uint64 value = 0;
>
> value = value * pow10(size("1234567890")) + 1234567890;
> on_number_part("1234567890", ec)
>
> value = value * pow10(size("9876543210")) + 9876543210;
> on_uint64(value, "9876543210", ec)
>
> (the actual implementation works on character per character basis and is
> slower than this)
>
> This will not really work once the value gets one digit larger, as the
> first argument will be garbage.

But if the value gets larger, the parser will not call on_uint64, it will
call on_double.

> The text to binary conversion shouldn't even be in the JSON parser,

OK, if you don't need the text to binary conversion, what's stopping you
from ignoring it, and only using the text part? You do get all the info,
it's just that the final part doesn't arrive via "on_number", but via the
other three callbacks, to save one call.

> > What are the limitations, and what is `n` in your suggested replacement?
>
> Same as for on_string_part/on_string, on_key_part/on_key,
> on_comment_part/on_comment etc.
>
> I'm arguing it should be there for consistency, it's not strictly
> necessary.

Yeah, fair point.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, 21 Sep 2020 at 15:23, Vinnie Falco <[hidden email]> wrote:
>
> "JSON-inspired binary formats" are not JSON. CBOR for example is
> "JSON-inspired", yet neither Node.js nor any web browser support it.
> They do however support standard JSON, and their respective DOMs have
> numerical limitations similar to the ones in Boost.JSON.

The main reason I suggested this is that you might also want to be
able to parse your in-memory representation as some kind of visitation
mechanism which you could push directly to the serializer.
You cannot since conversion of numbers to binary occurs, so there is
no string left.
So I'd design the parser callbacks so that numbers can be either
stored as text or binary in the source.


> I do not think it is possible for a single parser API and single
> serializer API to address both of these use-cases well. Optimizing for
> one will necessarily come at the expense of the other. Perhaps there
> is a hypothetical not-yet-written library just waiting to be
> discovered which will prove me wrong, by doing everything well. I have
> my doubts.

Doing a pull parser that supports data being pushed incrementally is
difficult as it requires the ability to suspend arbitrary code that is
trying to pull data beyond what is available.
That would be doable, maybe with a coroutine design, but adds a lot of
complexity.

To make people happy I think it's easier to provide a separate
incremental push parser and a non-incremental pull parser.
I don't think those should be in separate libraries, consistency is key.


> > The DOM does not support numbers that are neither 64-bit integral
> > numbers nor double-precision floating-point. This is a common
> > limitation, but still quite disappointing, since the proposal of it
> > being a canonical type is quite tainted by the fact it can't represent
> > JSON data losslessly. I think it would be better if another type was
> > provided for numbers that don't fit in the previous two categories.
>
> I disagree. Serialization of a canonical type should produce JSON that
> is interoperable with the majority of receivers. Neither Node.js nor
> browser implementations of JSON understand arbitrary precision
> integers out of the box. Unless the receiver is configured to process
> it, emitting an arbitrary precision number is certain to cause a
> compatibility problem.

You are writing the receiver, it is the JSON parser. Making it
correctly deal with numbers is important so that you don't corrupt the
work of whatever you are consuming.
If you know you are writing data for a consumer with certain
limitations, then up to you to transform your numbers into something
else.
Note that your code will still happily write 64-bit integers as
numbers, even though JavaScript can't handle more than 53 bits, and
will round those numbers and cause all sorts of problems.

Another thing is that I don't buy the argument that web browsers,
Node.JS and all the JavaScript ecosystem are the only thing that is
relevant.
The web is a domain that has a reputation of not being quite as
rigorous as systems programming which is the main application domain
as C++.
It is ok if script kiddies want to do whatever with their webby stuff,
but let's please do things properly when we're in C++-land.


> > Querying capabilities for that type are pretty limited and quite
> > un-C++ like. Examples use a C-style switch/extractor paradigm instead
> > of a much safer variant-like visitor approach.
>
> visit() is a planned feature which is already implemented:
>
> <https://github.com/vinniefalco/json/tree/visit>

Good.


> JSON Path and JSON Pointer are planned features:
>
> <https://goessner.net/articles/JsonPath/>
>
> <https://tools.ietf.org/html/rfc6901>

Nice.


> That is by design. Boost.JSON is not a serialization library. It
> addresses the use case for a JSON DOM with accompanying parser and
> serializer. This is not an insignificant use-case; witness the
> popularity of RapidJSON and nlohmann's JSON for Modern C++. In fact
> didn't you say you are already using RapidJSON?

I use RapidJSON because it's the established industry standard to do
precisely what this library does.
As I said I don't believe the value proposition of this library is
sufficient to replace existing things when all it does is the same
thing but with a Boost logo on it. There is no guarantee at this stage
this library will get any traction or maintenance on the same level as
one which is used by thousands of projects, so it's not worth the risk
to me.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, 21 Sep 2020 at 15:59, Peter Dimov <[hidden email]> wrote:

> But if the value gets larger, the parser will not call on_uint64, it will
> call on_double.

I didn't see that in the implementation, but that doesn't really
change my point (well at least it avoids having an undefined value,
replaces it with an approximation instead).

The value I gave is a valid uint64 value, but isn't a valid double.
Adding one digit would neither make it a valid uint64 nor a valid double.


> OK, if you don't need the text to binary conversion, what's stopping you
> from ignoring it, and only using the text part? You do get all the info,
> it's just that the final part doesn't arrive via "on_number", but via the
> other three callbacks, to save one call.

Separation of concerns, wasted cycles, etc.
Parsing numbers is the slowest part of all this parsing business, and
the part that's most dependent on how you want to represent numbers
into memory.

For example I typically represent numbers as decimals myself.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Sep 21, 2020 at 8:26 AM Mathias Gaunard
<[hidden email]> wrote:
> To make people happy I think it's easier to provide a separate
> incremental push parser and a non-incremental pull parser.
> I don't think those should be in separate libraries, consistency is key.

I disagree. I don't have any special expertise to bring to the table
when it comes to pull parsers. I don't need them for the applications
and libraries that I am developing. However I am proficient at writing
the kind of code needed for push parsing, and implementing containers
(having already done so many times in the past). If I were to
implement a JSON serialization library (which is what you are asking
for) I would be doing so basically from a fresh start. What I am
saying is that this type of implementation does not play to my
strength. There have already been individuals who proclaim to be
subject matter experts in this area - they should write this library.

"Consistency" is a vague goal. But here's what's not vague. Having
JSON serialization in its own library means it gets its own Travis
limits for individual forks. CI turnaround is faster. Tests run
faster. The documentation is focused only on the serialization
aspects. Users who need only JSON serialization, can use the
particular library without also bringing in a JSON DOM library. And
vice versa. To me, this separation of concerns has more value than
"consistency."

I also have to wonder, to what consistency do you refer? The parser
will not be consistent. One is pull, the other is push. The DOM will
not be consistent. One has it, the other doesn't. Serialization is in
a similar situation. In fact, when you compare a JSON DOM library to a
JSON Serialization library, they are about as different as you can
imagine, sharing only the nominal serialization format. Not much code
would be shared between a DOM and a serialization library.

I see nothing wrong with Boost.JSON being the library that users go to
when they want a container that is suitable as an interchange type.
And I see nothing wrong with someone proposing a new library that
implements JSON serialization to and from user-defined types, with
pull parsers, no DOM, and maybe even quotes from Aleister Crowley in
the documentation. Having fine-grained libraries which target narrow
user needs is something we should all agree is good for Boost (and C++
in general).

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Mathias Gaunard wrote:

> > OK, if you don't need the text to binary conversion, what's stopping you
> > from ignoring it, and only using the text part? You do get all the info,
> > it's just that the final part doesn't arrive via "on_number", but via
> > the other three callbacks, to save one call.
>
> Separation of concerns, wasted cycles, etc.
> Parsing numbers is the slowest part of all this parsing business, and the
> part that's most dependent on how you want to represent numbers into
> memory.
>
> For example I typically represent numbers as decimals myself.

Right. So you acknowledge that the parser as-is would work for your use
case, but you refuse to use it, and prefer Boost not to have it, because
separation of concerns and wasted cycles. You intuitively feel that it would
be too slow because it converts the numbers into int64/double, but haven't
measured it against a parser that doesn't. (If you do, you may be
surprised.)

FWIW, there's a reason concerns aren't separated here. It's that the number
can be arbitrarily long, so if you're going to separate the concerns, you'd
need to buffer the entire number in string from before being able to convert
it to int64/uint64/double. But since these have limited precision, if you do
the conversion on the fly, you no longer need to buffer the entire number in
string form. If the user wants the entire number, he can buffer it himself.
If not, that part is unnecessary and isn't done.

Since the majority of users are more interested in int64/uint64/double than
in infinite precision, the parser chooses to favor this case at the expense
of the other, and not vice versa.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 21/09/2020 16:25, Mathias Gaunard via Boost wrote:

>> That is by design. Boost.JSON is not a serialization library. It
>> addresses the use case for a JSON DOM with accompanying parser and
>> serializer. This is not an insignificant use-case; witness the
>> popularity of RapidJSON and nlohmann's JSON for Modern C++. In fact
>> didn't you say you are already using RapidJSON?
>
> I use RapidJSON because it's the established industry standard to do
> precisely what this library does.

For me RapidJSON is a legacy design useful only for legacy codebases.
sajson and simdjson are the current state of the art for new C++ code.
If you don't mind modifying the JSON being parsed (i.e. it's a mutable
buffer and you will throw away the JSON afterwards in any case),
refactoring around sajson's in-JSON AST generation delivers very
sizeable performance gains.

> As I said I don't believe the value proposition of this library is
> sufficient to replace existing things when all it does is the same
> thing but with a Boost logo on it. There is no guarantee at this stage
> this library will get any traction or maintenance on the same level as
> one which is used by thousands of projects, so it's not worth the risk
> to me.

Boost has a long history of duplicating something already widely used in
the C++ ecosystem, but "Boostified" so it has STL naming and design
patterns.

And that's fine by me. People who like a standard C++ API design over
other considerations will choose the Boost duplicate.

Niall

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, 21 Sep 2020 at 16:48, Vinnie Falco <[hidden email]> wrote:

>
> On Mon, Sep 21, 2020 at 8:26 AM Mathias Gaunard
> <[hidden email]> wrote:
> > To make people happy I think it's easier to provide a separate
> > incremental push parser and a non-incremental pull parser.
> > I don't think those should be in separate libraries, consistency is key.
>
> I disagree. I don't have any special expertise to bring to the table
> when it comes to pull parsers. I don't need them for the applications
> and libraries that I am developing. However I am proficient at writing
> the kind of code needed for push parsing, and implementing containers
> (having already done so many times in the past). If I were to
> implement a JSON serialization library (which is what you are asking
> for) I would be doing so basically from a fresh start. What I am
> saying is that this type of implementation does not play to my
> strength. There have already been individuals who proclaim to be
> subject matter experts in this area - they should write this library.
>
> "Consistency" is a vague goal. But here's what's not vague. Having
> JSON serialization in its own library means it gets its own Travis
> limits for individual forks. CI turnaround is faster. Tests run
> faster. The documentation is focused only on the serialization
> aspects. Users who need only JSON serialization, can use the
> particular library without also bringing in a JSON DOM library. And
> vice versa. To me, this separation of concerns has more value than
> "consistency."
>
> I also have to wonder, to what consistency do you refer? The parser
> will not be consistent. One is pull, the other is push. The DOM will
> not be consistent. One has it, the other doesn't. Serialization is in
> a similar situation. In fact, when you compare a JSON DOM library to a
> JSON Serialization library, they are about as different as you can
> imagine, sharing only the nominal serialization format. Not much code
> would be shared between a DOM and a serialization library.

Maybe pull parser is not the right terminology, the parser in question
would not have any DOM.

It would be something like.

template<class T>
struct deserialize_impl;

template<class T, class Deserializer>
T deserialize(Deserializer& s)
{
    return deserialize_impl<T>::impl(s);
}

struct Bar
{
    int value;
};

struct Baz
{
};

struct Foo
{
    Bar bar;
    string baz;
};

template<>
struct deserialize_impl<Bar>
{
    template<class Deserializer>
    static Bar impl(Deserializer& s)
    {
        s.object_begin();
        s.key("value");
        int value = std::stoi(s.number());
        s.object_end();
        return Bar{std::move(value)};
    }
};

template<>
struct deserialize_impl<Foo>
{
    template<class Deserializer>
    static Foo impl(Deserializer& s)
    {
        s.object_begin();
        s.key("bar");
        Bar bar = deserialize<Bar>(s);
        s.key("baz");
        string baz = s.string();
        s.object_end();
        return Foo{std::move(bar), std::move(baz)};
    }
};

(possibly with shorthand syntax to simplify redundant patterns, such
as automatic object_begin/object_end and deserialize function variant
with automatic key, which would allow constructing directly)

You can see the name and sequence of the events is still perfectly
in-line with the push parser.
Serialize could also be written in a symmetric fashion.

>
> I see nothing wrong with Boost.JSON being the library that users go to
> when they want a container that is suitable as an interchange type.
> And I see nothing wrong with someone proposing a new library that
> implements JSON serialization to and from user-defined types, with
> pull parsers, no DOM, and maybe even quotes from Aleister Crowley in
> the documentation. Having fine-grained libraries which target narrow
> user needs is something we should all agree is good for Boost (and C++
> in general).

The problem with this is that you end up having to pick a million
different things to solve very small problems all in slightly
inconsistent ways. There is a cost with adopting any new library or
API, both technical and mental.
In 98% of cases, when I write a websocket interface (with boost beast,
kudos to you), I define my messages as C++ types, and I parse incoming
JSON into those types. I could parse the incoming JSON into
json::value, and either use it directly, but then I lose static naming
and typing, or I could convert it into my types, which is an
unnecessary step which can even introduce loss of data.
In the 2% remaining percent, I write arbitrary data to some log or
record and I need to be able to visualize it or export it to a
database. In that case, I don't know the schema of the information
ahead of time, and json::value is therefore what I want to use.
In both cases, I'd like to read/write my data from/to JSON with the
same framework.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
On Mon, Sep 21, 2020 at 9:44 AM Mathias Gaunard
<[hidden email]> wrote:
>         s.object_begin();
>         s.key("bar");
>         Bar bar = deserialize<Bar>(s);
>         s.key("baz");
>         string baz = s.string();
>         s.object_end();
>         return Foo{std::move(bar), std::move(baz)};

This looks very much like it depends on the order of appearance of
keys in the JSON, which is non-standard.

> In both cases, I'd like to read/write my data from/to JSON with the
> same framework.

Why? What specifically, is the requirement here?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, 21 Sep 2020 at 17:01, Niall Douglas via Boost
<[hidden email]> wrote:

> For me RapidJSON is a legacy design useful only for legacy codebases.
> sajson and simdjson are the current state of the art for new C++ code.
> If you don't mind modifying the JSON being parsed (i.e. it's a mutable
> buffer and you will throw away the JSON afterwards in any case),
> refactoring around sajson's in-JSON AST generation delivers very
> sizeable performance gains.

Both of these only work with a mutable contiguous buffer backing the data.
The way I think of this is that those libraries convert JSON into a
flat indexed JSON format which can just be mapped to memory and
accessed as-is without any decoding.

RapidJSON and this library are different in that they support
arbitrary streams as input and decorrelate the serialized format from
the in-memory representation.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, 21 Sep 2020 at 17:49, Vinnie Falco <[hidden email]> wrote:

> This looks very much like it depends on the order of appearance of
> keys in the JSON, which is non-standard.

You'd parse the object until you reach the desired key, building an
index of where the various keys you meet on the way are (with possibly
extra information of where the associated value starts/end etc.), then
for the next one either it's in the index so you can use it then
discard it, or it's ahead, so you can continue with the same
algorithm.
That would be optimized for keys being ordered as expected, ignoring
extra keys, while still supporting out-of-order keys (albeit with some
extra storage requirements that grows with your nesting level).

>
> > In both cases, I'd like to read/write my data from/to JSON with the
> > same framework.
>
> Why? What specifically, is the requirement here?

What I'd like is a way to describe how my C++ types map to a key-value
structure with normalized types so that I can easily convert my
objects back and forth through a structured self-describing and
human-readable interchange format.
Ideally I'd like to do so without repeating myself with all kinds of
converters, serializers, etc. and reasonably efficiently, which is why
I'd like these things to be reasonably unified.

Boost.Serialization doesn't do this, since it takes the approach of
designing a data format to serialize any C++ structure, rather than
mapping a C++ structure to a simpler text-based structure.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
On Mon, 21 Sep 2020 at 19:30, Mathias Gaunard via Boost <
[hidden email]> wrote:

> On Mon, 21 Sep 2020 at 17:49, Vinnie Falco <[hidden email]> wrote:
>
>
>
> > This looks very much like it depends on the order of appearance of
>
> > keys in the JSON, which is non-standard.
>
>
>
> You'd parse the object until you reach the desired key, building an
>
> index of where the various keys you meet on the way are (with possibly
>
> extra information of where the associated value starts/end etc.), then
>
> for the next one either it's in the index so you can use it then
>
> discard it, or it's ahead, so you can continue with the same
>
> algorithm.
>
> That would be optimized for keys being ordered as expected, ignoring
>
> extra keys, while still supporting out-of-order keys (albeit with some
>
> extra storage requirements that grows with your nesting level).
>
>
>
> >
>
> > > In both cases, I'd like to read/write my data from/to JSON with the
>
> > > same framework.
>
> >
>
> > Why? What specifically, is the requirement here?
>
>
>
> What I'd like is a way to describe how my C++ types map to a key-value
>
> structure with normalized types so that I can easily convert my
>
> objects back and forth through a structured self-describing and
>
> human-readable interchange format.
>
> Ideally I'd like to do so without repeating myself with all kinds of
>
> converters, serializers, etc. and reasonably efficiently, which is why
>
> I'd like these things to be reasonably unified.
>
>
A custom handler for basic_parser ought to be able to do this, coupled with
a simple way of describing reflection in terms of element name and
reference to value (or get/set functions).




>
> Boost.Serialization doesn't do this, since it takes the approach of
>
> designing a data format to serialize any C++ structure, rather than
>
> mapping a C++ structure to a simpler text-based structure.
>
>
>
> _______________________________________________
>
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>
> --
Richard Hodges
[hidden email]
office: +442032898513
home: +376841522
mobile: +376380212

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Sep 21, 2020 at 10:29 AM Mathias Gaunard
<[hidden email]> wrote:
> > > In both cases, I'd like to read/write my data from/to JSON with the
> > > same framework.
> >
> > Why? What specifically, is the requirement here?
>
> What I'd like is a way to describe how my C++ types map to a key-value
> structure with normalized types so that I can easily convert my
> objects back and forth through a structured self-describing and
> human-readable interchange format.

Right, but what I'm asking you is: *specifically* in what way would a
framework that offers both JSON DOM and JSON Serialization be
"consistent?" Can you show in terms of declarations, what that would
look like? In other words I'm asking you to show using example code
how putting these two separate concerns in a single library would
offer benefits over having them in separate libraries.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list

> On 21. Sep 2020, at 19:44, Vinnie Falco via Boost <[hidden email]> wrote:
>
> On Mon, Sep 21, 2020 at 10:29 AM Mathias Gaunard
> <[hidden email]> wrote:
>>>> In both cases, I'd like to read/write my data from/to JSON with the
>>>> same framework.
>>>
>>> Why? What specifically, is the requirement here?
>>
>> What I'd like is a way to describe how my C++ types map to a key-value
>> structure with normalized types so that I can easily convert my
>> objects back and forth through a structured self-describing and
>> human-readable interchange format.
>
> Right, but what I'm asking you is: *specifically* in what way would a
> framework that offers both JSON DOM and JSON Serialization be
> "consistent?" Can you show in terms of declarations, what that would
> look like? In other words I'm asking you to show using example code
> how putting these two separate concerns in a single library would
> offer benefits over having them in separate libraries.

The Boost.Serialization framework is able to do both. The two things needs are
a) a JSON archive that follows the Archive concept
b) a serialize function for json::value

Once you have these two, you can convert the DOM type and/or any custom type to JSON and back.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list

> On 21. Sep 2020, at 19:57, Hans Dembinski <[hidden email]> wrote:
>
>>
>> On 21. Sep 2020, at 19:44, Vinnie Falco via Boost <[hidden email]> wrote:
>>
>> On Mon, Sep 21, 2020 at 10:29 AM Mathias Gaunard
>> <[hidden email]> wrote:
>>>>> In both cases, I'd like to read/write my data from/to JSON with the
>>>>> same framework.
>>>>
>>>> Why? What specifically, is the requirement here?
>>>
>>> What I'd like is a way to describe how my C++ types map to a key-value
>>> structure with normalized types so that I can easily convert my
>>> objects back and forth through a structured self-describing and
>>> human-readable interchange format.
>>
>> Right, but what I'm asking you is: *specifically* in what way would a
>> framework that offers both JSON DOM and JSON Serialization be
>> "consistent?" Can you show in terms of declarations, what that would
>> look like? In other words I'm asking you to show using example code
>> how putting these two separate concerns in a single library would
>> offer benefits over having them in separate libraries.
>
> The Boost.Serialization framework is able to do both. The two things needs are
> a) a JSON archive that follows the Archive concept
> b) a serialize function for json::value
>
> Once you have these two, you can convert the DOM type and/or any custom type to JSON and back.

On second thought, the Archive concept probably needs an extension to work with json::value. While writing is no problem, reading is a challenge.

For reading, Boost.Serialization usually relies on the fact that the type that is going to be read next from the archive is statically known. In case of a boost::variant, a polymorphic type similar to json::value, it reads a type id first (some integer) and then calls the appropriate code based on that value.

While that would also work for the json::value type, we obviously don't want to store explicit type ids in JSON. So the type info would have to come from the Archive itself in this case, which probably could do some look-ahead into the stream to provide info like "then next type in the stream is a number/string/array of 3 elements".

Bottom line, it is not so easy as I thought.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review] [json] My JSON review

Boost - Dev mailing list
Em seg., 21 de set. de 2020 às 16:19, Hans Dembinski via Boost
<[hidden email]> escreveu:
> On second thought, the Archive concept probably needs an extension to work with json::value.
> While writing is no problem, reading is a challenge.

It depends on the lens you use. The archive concept overloads
primitive types to be serializable. Other serializables are always
built on top of the primitives the underlying archive has special code
for.

From one lens, you're using json::value against json::archive. If
that's the case, it has special code here, so fine and everything
works as has been described in previous emails.

From the second lens, you have a problem, external archive. And here,
the rest of the rationale you described in your email applies.

You do need extensions in a separate concept thou. If you want to
control beautiful JSON serialization, you need to expose a few
extensions like Bjørn has done in his serialization code. I do
disagree about a few things he has done, but we can reach consensus in
the future.

One thing I want to develop here is integration to Boost.Hana
reflection. If you have integration to Boost.Hana Struct concept, it
helps a ton. First, creating new serializables will be very easy.
Second, you avoid the whole "index/buffer/backtrack/lookahead" problem
you guys mentioned. This won't work for push parsers, but pull parsers
will benefit a lot.

> Bottom line, it is not so easy as I thought.

True. I do would like to discuss this subject, but I need to organize
my own thoughts first. It'd be nice if you, Mathias Gaunard, Robert
Ramey, and other stakeholders could gather together to explore this
topic in detail after Boost.JSON review.


--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost