[review][json] My review

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[review][json] My review

Boost - Dev mailing list
- Do you think the library should be accepted as a Boost library?

Yes, my vote is to ACCEPT this library.

- What is your evaluation of the design?

Much has been debated recently on the merits of push parsers vs. pull
parsers.  Boost.JSON provides both a push parser (SAX-like interface)
and a DOM parser (which is somewhat pull-like but I don't think is
technically either kind).

It does not provide a "real" pull parser, which are sometimes more
efficient for deserialisation into custom types; the SAX parser can
somewhat do that but is not composable, which makes it harder to use in
practice, unless I'm missing something (this is usually why I end up
never using SAX parsers).  If so, I'd like to see a more complete
example where the handler delegates part of the parsing to a class that
in turn delegates part to another class (as with reading typed
sub-values).  I don't think this is currently possible.

So I think I can understand some of the debate around this point; while
I don't think it's strictly necessary that this particular library
implement that, if it doesn't then another library will end up having
potentially competing types and the natural Boost.Json name will already
be taken.  Personally, however, I don't think this is sufficient reason
to block acceptance of this library.

Direct pull parsing doesn't mesh well with JSON, due to the requirement
to tolerate reordered keys, so any non-defective parser would have to
have something DOM-like under the hood anyway.  (Some kind of cache, if
not necessarily a traditional DOM.)  As such it should be sufficient to
use json::value itself as part of a pull-like parsing interface.


Regarding the SAX parser specifically, I dislike the *_part methods in
the callback interface.  I feel that these should be removed and only
the final form retained, called only when the complete text of the value
has been determined.  I don't feel it is ever sensible to call back with
a partially-parsed value (unless I am misunderstanding what these are
for, but there is little explanation in the docs, and I wasn't able to
get them called at all in my test).

I also don't see the point of supplying a size argument to on_key* and
on_string, as this can be read from the string_view argument instead.
The various number methods don't pass this separately.  Similarly,
having a bool return as well as an error_code output argument seems
peculiar; one or the other should be sufficient, as is it just invites
inconsistency.

parser.write() is incremental but basic_parser.write() is not, which
seems odd.  I assume this means that parser is storing an incrementally
populated buffer of the full document rather than _really_ incrementally
parsing.

parse/parser do not appear to support consuming only part of the
supplied input as a complete JSON value (returning how much was
consumed), which is useful for streams where multiple objects are
concatenated without explicit separation.


Regarding other aspects of design, I haven't drilled down in great
detail but the basic interface seems reasonably sound, other than the
concerns raised above.

I have mixed feelings on the initializer-list construction method -- I
like that there is some method of doing this, and it seems like
idiomatic C++ (particularly regarding how key-value pairs appear), but
also very non-JSON -- an array of objects ends up having three nested
braces.  But I don't have any specific suggestion for improvement.

- What is your evaluation of the implementation?

It seems unfortunate (though understandable, due to heap allocation)
that you cannot declare a json::value / json::object as constexpr, for
cases where you do want some kind of constant reference object (e.g. a
schema description).  You can somewhat work around this by declaring the
JSON as a constexpr string_view and later parsing it to a non-constexpr
json::value, but that seems a bit awkward (and potentially
bad-perf-prone if someone parses it multiple times).

Otherwise I didn't really look into the implementation at all.

- What is your evaluation of the documentation?

Overall, good, if perhaps a bit sparse on some aspects.

One pet peeve however is that most of the examples appear to be written
as if "using namespace boost::json" is in effect, which I feel both
encourages bad practice and makes it harder to distinguish which types
are provided by the library vs. exist only for the example.  This is
especially confusing in the examples for tag_invoke, but not only there
-- another example of confusion are the aliases for types present in
both boost and std namespaces.  I strongly recommend going through and
changing these to explicitly use a json:: namespace prefix on all
library types (including the boost/std aliases).  Brevity is useful, but
not at the expense of clarity.

The "Using Numbers" section of the docs appears to be entirely blank.

"serializer can be used to incrementally serialize a value
incrementally"  (redundant words are redundant)

(On a related note, an example use case for serializer is listed as
"ensure a fixed amount of work is performed in each cycle", but the only
means of control is the length of the provided buffer.  I guess this is
equating work to number of characters output, which seems dubious.  But
I have no specific objections.)

- What is your evaluation of the potential usefulness of the library?

The library seems well suited for JSON as inter-process/network data
interchange, other than perhaps the omission of partial input parsing.

It seems less suited for JSON as configuration file -- while it can
parse such files (via common extensions such as ignoring comments and
trailing commas) it lacks means to pretty-print and to round-trip the
comments, which would be essential if writing back a config file
intended for human use.  However while supporting this use case would be
nice, I don't consider it mandatory.

- Did you try to use the library? With what compiler? Did you have any
problems?

I tried it out with MSVC 2017 in standalone header-only mode, though
only with a few small examples.

The basic_parser examples do not compile out of the box -- with the
suggested json.hpp and src.hpp includes there are link errors with
basic_parser symbols.  Additionally including basic_parser.hpp resolves
these but starts complaining about missing
max_{key,string,array,object}_size members of the handler type, which
are not documented.  After these are supplied then it works as expected.
  I don't see why these should be needed, however.

- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?

Roughly four hours over a few days reviewing docs and trying some small
experiments.

- Are you knowledgeable about the problem domain?

I have some apps that use JSON as both interchange and configuration
storage formats, although most of these do not use C++.  So, "somewhat".

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
Em ter., 22 de set. de 2020 às 05:12, Gavin Lambert via Boost
<[hidden email]> escreveu:
> Direct pull parsing doesn't mesh well with JSON, due to the requirement
> to tolerate reordered keys, so any non-defective parser would have to
> have something DOM-like under the hood anyway.  (Some kind of cache, if
> not necessarily a traditional DOM.)  As such it should be sufficient to
> use json::value itself as part of a pull-like parsing interface.

Not really. We can debate this topic when the 10-day review period
pressure is over. Check my previous emails and you'll find hints on
where this is going. And it's not hypothetical as some suggest.
There's actual code lying around these concepts already.



--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert via Boost
<[hidden email]> wrote:
> Regarding the SAX parser specifically, I dislike the *_part methods in
> the callback interface.  I feel that these should be removed and only
> the final form retained, called only when the complete text of the value
> has been determined.

The parser (and serializer) support incremental / streaming operation.
To provide the "complete" text would require allocating memory and
saving the partial result until enough input has been supplied to
finish the text. basic_parser does not currently allocate memory,
which was a design goal, and this would change that.

> I don't feel it is ever sensible to call back with
> a partially-parsed value (unless I am misunderstanding what these are
> for, but there is little explanation in the docs, and I wasn't able to
> get them called at all in my test).

In order to see partial results, you have to invoke the parser with
partial JSON. In other words, break up the input JSON into two or more
consecutive buffers of characters, and supply them in order.

> I also don't see the point of supplying a size argument to on_key* and
> on_string, as this can be read from the string_view argument instead.

Right, so this size argument is actually the total number of
characters, while the string_view::size is the number of characters in
the current piece.

> having a bool return as well as an error_code output argument seems
> peculiar; one or the other should be sufficient, as is it just invites
> inconsistency.

Peculiar, but fast :) this was profiled and optimized. The bool return
plus ec is superior to having only ec.

> parser.write() is incremental but basic_parser.write() is not, which
> seems odd.  I assume this means that parser is storing an incrementally
> populated buffer of the full document rather than _really_ incrementally
> parsing.

Everything is incremental. parser::write() has the condition that if
there are any leftover characters after receiving a complete JSON, an
error is returned. Contrast this with parser::write_some which does
not produce an error in this case. basic_parser::write is more similar
to parser::write_some. We will be renaming it to
basic_parser::write_some, as another reviewer suggested, to make this
more clear. There is also a synergy with asio's naming with that
change.

> parse/parser do not appear to support consuming only part of the
> supplied input as a complete JSON value (returning how much was
> consumed), which is useful for streams where multiple objects are
> concatenated without explicit separation.

parse() the free function does not, but parser does - use parser::write_some.

> The "Using Numbers" section of the docs appears to be entirely blank.

This is fixed in develop.

> (On a related note, an example use case for serializer is listed as
> "ensure a fixed amount of work is performed in each cycle", but the only
> means of control is the length of the provided buffer.  I guess this is
> equating work to number of characters output, which seems dubious.

Well, yeah, that's exactly what it is. Consider a network server that
delivers JSON to clients. We don't want a completion handler to be
stuck serializing 100MB of JSON while another client is waiting who
only needs 1MB serialized. The way to ensure fairness here is to
serialize incrementally with some reasonable buffer size, say 32kb.
The light client will finish in ~33 i/o cycles while the heavy client
still makes progress.

> It seems less suited for JSON as configuration file -- while it can
> parse such files (via common extensions such as ignoring comments and
> trailing commas) it lacks means to pretty-print and to round-trip the
> comments, which would be essential if writing back a config file
> intended for human use.

To store the comments would mean storing them in the DOM, which would
have a significant impact on performance as the space for everything
is tightly controlled.

> The basic_parser examples do not compile out of the box -- with the
> suggested json.hpp and src.hpp includes there are link errors with
> basic_parser symbols.  Additionally including basic_parser.hpp resolves
> these but starts complaining about missing
> max_{key,string,array,object}_size members of the handler type, which
> are not documented.  After these are supplied then it works as expected.
>   I don't see why these should be needed, however.

Those max sizes should be documented in the handler exemplar:

<https://master.json.cpp.al/json/ref/boost__json__basic_parser.html#json.ref.boost__json__basic_parser.handler0>

The reason they are required is for optimization. Handling the limit
in the basic_parser leads to performance gains.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert via Boost
<[hidden email]> wrote:
> Yes, my vote is to ACCEPT this library.

And, thank you for taking the time to look at this library!

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 23/09/2020 05:55, Vinnie Falco wrote:

> On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert wrote:
>> Regarding the SAX parser specifically, I dislike the *_part methods in
>> the callback interface.  I feel that these should be removed and only
>> the final form retained, called only when the complete text of the value
>> has been determined.
>
> The parser (and serializer) support incremental / streaming operation.
> To provide the "complete" text would require allocating memory and
> saving the partial result until enough input has been supplied to
> finish the text. basic_parser does not currently allocate memory,
> which was a design goal, and this would change that.

Not necessarily.  A small buffer can be statically allocated for a
single value-in-progress.  It's trivial for all numeric values (except
perhaps pathological precision, but that can be ignored).  It's harder
for string values; that might have to support dynamic allocation, it's true.

I just don't see the point in having an interface that can return half
of a value -- this just moves the problem of allocation and keeping
track of the full value to the parser handler, which seems like an SRP
violation.

>> I don't feel it is ever sensible to call back with
>> a partially-parsed value (unless I am misunderstanding what these are
>> for, but there is little explanation in the docs, and I wasn't able to
>> get them called at all in my test).
>
> In order to see partial results, you have to invoke the parser with
> partial JSON. In other words, break up the input JSON into two or more
> consecutive buffers of characters, and supply them in order.

I tried that, but the parse simply failed at the end of the first
partial buffer and it completely ignored the second buffer.  Hence my
statement that basic_parser did not operate incrementally.  Though see
below; this appears to be a failure of example rather than of
implementation.

>> I also don't see the point of supplying a size argument to on_key* and
>> on_string, as this can be read from the string_view argument instead.
>
> Right, so this size argument is actually the total number of
> characters, while the string_view::size is the number of characters in
> the current piece.

Which is a poor design choice, as I said.

>> parser.write() is incremental but basic_parser.write() is not, which
>> seems odd.  I assume this means that parser is storing an incrementally
>> populated buffer of the full document rather than _really_ incrementally
>> parsing.
>
> Everything is incremental. parser::write() has the condition that if
> there are any leftover characters after receiving a complete JSON, an
> error is returned. Contrast this with parser::write_some which does
> not produce an error in this case. basic_parser::write is more similar
> to parser::write_some. We will be renaming it to
> basic_parser::write_some, as another reviewer suggested, to make this
> more clear. There is also a synergy with asio's naming with that
> change.

I wrote a print_parser (which is just like the null_parser example but
prints the method calls and parameters) and invoked it like so:

        json::error_code ec;
        print_parser parser;
        std::string data1 = R"j({"foo":14,"bar":1)j";
        std::string data2 = R"j(8,"baz":"bo)j";
        std::string data3 = R"j(b"})j";
        parser.write(&data1[0], data1.size(), ec);
        parser.write(&data2[0], data2.size(), ec);
        parser.write(&data3[0], data3.size(), ec);

It stopped after calling on_int64(1, "1") and did not continue parsing,
completely ignoring the second buffer, never calling any _part methods
and reporting the wrong integer value.  It parses correctly only if both
are supplied.  Hence, as I said, it is not behaving incrementally.

However I have since noticed that the missing link is that it must pass
'true' as the 'more' parameter to basic_parser.write, which is not
demonstrated in any example; it may be helpful to show that.  (In fact
this resolves my other concern about multi-document parsing as well,
which also could use an explicit example.)

Once this is done, then (as expected by the docs) it reports these calls
(in part):

     on_number_part("1")
     on_int64(18, "8")

     on_string_part("bo", 2)
     on_string("b", 3)

So I reiterate that the number and string handling is inconsistent and
these part methods are more confusing than helpful -- numbers are parsed
into the complete value but strings are not, and neither give you the
full text of the value (which may be important if the goal is to parse
into an arbitrary-precision value type).

I do regard this as a very poor design choice, but not sufficiently so
to reject the library.

(I still also regard SAX-style parsing itself as a poor design choice
[vs pull parsing], due to poor composability, but that is not a failure
of the library itself, except perhaps in offering it at all.  But I can
understand why.)

> Well, yeah, that's exactly what it is. Consider a network server that
> delivers JSON to clients. We don't want a completion handler to be
> stuck serializing 100MB of JSON while another client is waiting who
> only needs 1MB serialized. The way to ensure fairness here is to
> serialize incrementally with some reasonable buffer size, say 32kb.
> The light client will finish in ~33 i/o cycles while the heavy client
> still makes progress.

I was assuming that a "deep" JSON (many nested objects) may have a
different cost per output character than a wide but shallow one.  But
it's not a big deal.

> To store the comments would mean storing them in the DOM, which would
> have a significant impact on performance as the space for everything
> is tightly controlled.

I don't see why.  It's just an extra string associated with any value.

FWIW, I don't think it needs to *precisely* round-trip the comments --
it's ok to convert comments to /**/ style (that's mandatory if the
output isn't pretty-printed) and to consolidate all comments associated
with one value to a single comment that's always output after that value
(or perhaps have 'before' and 'after' slots).  This should be much more
manageable.

As it stands, the library is completely useless for the case of writing
a JSON config file intended for human use.  It's ok if you don't want
that to be a goal, but this should be made clear up front so that people
don't waste time considering the library for use cases it's not intended
for.  (Though it would be preferable if you did support it.)

>> The basic_parser examples do not compile out of the box -- with the
>> suggested json.hpp and src.hpp includes there are link errors with
>> basic_parser symbols.  Additionally including basic_parser.hpp resolves
>> these but starts complaining about missing
>> max_{key,string,array,object}_size members of the handler type, which
>> are not documented.  After these are supplied then it works as expected.
>>    I don't see why these should be needed, however.
>
> Those max sizes should be documented in the handler exemplar:
>
> <https://master.json.cpp.al/json/ref/boost__json__basic_parser.html#json.ref.boost__json__basic_parser.handler0>

The docs I am referring to are the ones referenced in the original
pre-announcement post (although now I notice that the main announcement
post has a different link):

 
http://vinniefalco.github.io/doc/json/json/ref/boost__json__basic_parser.html

This is also where the "using numbers" section is entirely blank.

> The reason they are required is for optimization. Handling the limit
> in the basic_parser leads to performance gains.

I don't really understand why.  basic_parser doesn't actually store any
of the keys/values (especially as it doesn't merge value parts), so it
should have no reason to impose any kind of limit.  This just seems like
it's doing an extra count/comparison that could be removed entirely, and
removing it would naturally improve performance.

Unless you're saying that moving the check to basic_parser improves the
performance of parser, in which case: you're doing it wrong.  Imposing
restrictions on all users of basic_parser for something that only parser
cares about is the wrong design choice.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
On Tue, Sep 22, 2020 at 4:47 PM Gavin Lambert via Boost
<[hidden email]> wrote:
> I just don't see the point in having an interface that can return half
> of a value -- this just moves the problem of allocation and keeping
> track of the full value to the parser handler, which seems like an SRP
> violation.

It is the opposite of SRP violation. basic_parser is concerned only
with parsing and not keeping track of the full value. The
responsibility of tracking full values is moved to the caller. That
way, basic_parser does not impose its opinions on how the full value
should be stored. This was litigated in earlier posts if I recall
(Peter?).

> >> I don't feel it is ever sensible to call back with
> >> a partially-parsed value (unless I am misunderstanding what these are
> >> for, but there is little explanation in the docs, and I wasn't able to
> >> get them called at all in my test).

The interfaces are built up in layers. At the lowest layer you have
basic_parser which concerns itself only with processing the input
buffer and turning it into semantic components. It does _not_ have the
responsibility of buffering input to present it as a larger component,
that's the job of the handler. basic_parser is exposed to users so
they can use their own schemes for dealing with the semantic
components.

Above basic_parser, you have parser which uses value_stack. The
value_stack does take responsibility for buffering input, and
producing DOM values. As with basic_parser, both of these components
are exposed to users so they can write their own layers on top.

On top of parser we have the parse() free functions, which address the
most popular use-case of parsing. These functions will satisfy most
needs. And for the needs that are not satisfied, users can easily
implement their own parse functions with different behaviors, as the
library exposes all the intermediate lower levels.

> > Right, so this size argument is actually the total number of
> > characters, while the string_view::size is the number of characters in
> > the current piece.
>
> Which is a poor design choice, as I said.

No, the information is already there so we are just passing it to the
handler which can use it, or not. If the handler doesn't use it, then
it is optimized away.

>         parser.write(&data1[0], data1.size(), ec);
>         parser.write(&data2[0], data2.size(), ec);
>         parser.write(&data3[0], data3.size(), ec);

You have to call basic_parser::reset in order to start parsing a new "document."

> So I reiterate that the number and string handling is inconsistent and
> these part methods are more confusing than helpful -- numbers are parsed
> into the complete value but strings are not, and neither give you the
> full text of the value (which may be important if the goal is to parse
> into an arbitrary-precision value type).
>
> I do regard this as a very poor design choice, but not sufficiently so
> to reject the library.

No it is not a poor design choice, it is the correct design choice. If
basic_parser were to buffer the input, then users who have no need of
buffered input (such as value_stack) would be paying for it for
nothing. This interface is driven first and foremost for performance
and the incremental/streaming functionality. If callers want to see
full text, they can append the partial strings into a data member
until the final on_string or on_int64 / on_uint64 / on_double and then
interact with it that way. But they are not obliged to do so.

Again I repeat, this is not a poor design choice. json::string for
example is perfectly capable of being constructed from two separate
buffers of characters (an implementation detail):

<https://github.com/CPPAlliance/json/blob/6ddddfb16f9b54407d153abdda11a29f74642854/include/boost/json/impl/value_stack.ipp#L428>

If we were to follow your "correct design" to make basic_parser buffer
the input, the implementation would be needlessly allocating memory
and copying characters.

> I was assuming that a "deep" JSON (many nested objects) may have a
> different cost per output character than a wide but shallow one.  But
> it's not a big deal.

Yes, it isn't perfect of course but over time and with a reasonable
distribution of possible JSON inputs it averages out to be good
enough. Certainly better than not having streaming at all (which would
be terrible).

> > To store the comments would mean storing them in the DOM, which would
> > have a significant impact on performance as the space for everything
> > is tightly controlled.
>
> I don't see why.  It's just an extra string associated with any value.

Yes exactly. sizeof(value)==16 on 32bit platforms, and
sizeof(value)==24 on 64-bit platforms. When arrays and objects are
constructed during parsing, the implementation performs a memcpy after
allocation. Every additional byte in the sizeof(value) slows down the
creation of arrays and objects. Adding a pointer to value would cost
25% on 32-bit and 33% on 64-bit, and make Boost.JSON unable to ever
beat RapidJSON in performance.

> This just seems like
> it's doing an extra count/comparison that could be removed entirely, and
> removing it would naturally improve performance.

If you don't want a limit then just set the max to `size_t(-1)` and
the compiler will elide the check completely.

> Unless you're saying that moving the check to basic_parser improves the
> performance of parser, in which case: you're doing it wrong.  Imposing
> restrictions on all users of basic_parser for something that only parser
> cares about is the wrong design choice.

The design of these components balances usability with performance. If
you don't care about limits you can just set the max to size_t(-1).
And I think this is the right choice. The library makes common things
easy, e.g. call parse(). If you need more complexity, you instantiate
a `parser` whose API requires a few more steps. If you really want to
go low level, you use `basic_parser` which being low level can seem
esoteric.

And I think this is okay, because the percentage of users who will use
basic_parser is incredibly tiny. The percentage of `parser` users much
greater. While the percentage of users of the parse() free function
will be closer to 100% than anything else. In other words the library
favors performance and utility at the lowest levels over theoretical
interface correctness, where the number of users is naturally smaller
(and their domain-expertise naturally higher).

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
On 23/09/2020 12:52, Vinnie Falco wrote:
> It is the opposite of SRP violation. basic_parser is concerned only
> with parsing and not keeping track of the full value. The
> responsibility of tracking full values is moved to the caller. That
> way, basic_parser does not impose its opinions on how the full value
> should be stored. This was litigated in earlier posts if I recall
> (Peter?).

Except it does, because it parses into int/uint/double, which is
inherently a storage decision.  You might have a valid argument if it
left that step to the handler to determine as well.  (i.e. only
providing on_number_part and on_number, just the text without any
attempt to parse into a numeric type.)

Making that change would probably make the arbitrary-precision-parsing
people happier as well.

> The interfaces are built up in layers. At the lowest layer you have
> basic_parser which concerns itself only with processing the input
> buffer and turning it into semantic components. It does _not_ have the
> responsibility of buffering input to present it as a larger component,
> that's the job of the handler. basic_parser is exposed to users so
> they can use their own schemes for dealing with the semantic
> components.
>
> Above basic_parser, you have parser which uses value_stack. The
> value_stack does take responsibility for buffering input, and
> producing DOM values. As with basic_parser, both of these components
> are exposed to users so they can write their own layers on top.

My point is that this seems like the wrong choice of layers.  As it
stands, consumers should never care about the basic_parser layer, and
instead will actually want a different layer (not currently exposed)
that is between the two, which exposes complete values that are not in
any DOM.

The current form of basic_parser could still exist, if you like, but it
should be made an implementation detail and the complete-value
basic_parser actually exposed to consumers.

>>          parser.write(&data1[0], data1.size(), ec);
>>          parser.write(&data2[0], data2.size(), ec);
>>          parser.write(&data3[0], data3.size(), ec);
>
> You have to call basic_parser::reset in order to start parsing a new "document."

This is not parsing new documents, but new document fragments.  As such
it must not reset.

> No it is not a poor design choice, it is the correct design choice. If
> basic_parser were to buffer the input, then users who have no need of
> buffered input (such as value_stack) would be paying for it for
> nothing. This interface is driven first and foremost for performance
> and the incremental/streaming functionality. If callers want to see
> full text, they can append the partial strings into a data member
> until the final on_string or on_int64 / on_uint64 / on_double and then
> interact with it that way. But they are not obliged to do so.

There should never be any such consumers.  Partial values are not useful
to anyone.  If they exist in the library, that is because things are
structured improperly.

> If we were to follow your "correct design" to make basic_parser buffer
> the input, the implementation would be needlessly allocating memory
> and copying characters.

It has to happen somewhere, regardless.  Why are you pushing that
reponsibility onto the consumer?

> And I think this is the right choice. The library makes common things
> easy, e.g. call parse(). If you need more complexity, you instantiate
> a `parser` whose API requires a few more steps. If you really want to
> go low level, you use `basic_parser` which being low level can seem
> esoteric.
>
> And I think this is okay, because the percentage of users who will use
> basic_parser is incredibly tiny. The percentage of `parser` users much
> greater. While the percentage of users of the parse() free function
> will be closer to 100% than anything else. In other words the library
> favors performance and utility at the lowest levels over theoretical
> interface correctness, where the number of users is naturally smaller
> (and their domain-expertise naturally higher).

So why provide basic_parser at all, if its interface is deliberately
incorrect?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list


> On Sep 22, 2020, at 4:12 AM, Gavin Lambert via Boost <[hidden email]> wrote:
>
> It seems less suited for JSON as configuration file -- while it can parse such files (via common extensions such as ignoring comments and trailing commas) it lacks means to pretty-print and to round-trip the comments, which would be essential if writing back a config file intended for human use.  However while supporting this use case would be nice, I don't consider it mandatory.

Out of curiosity, why would you want to round-trip the comments from a config file?

Wouldn’t you want to write the comments anew each time you write the file, in case your code version had newer or corrected info? (I’m assuming the comments are like “// the following setting enables self-destruct (default=false)”)

I would think you’d want such “comments” to come from a schema, not the DOM. (Of course that would require schema-based serialization, or some form of serialization control, but that’s a separate topic)

Or do you want the human to be able to change comments and/or write their own, that get saved back? We have something like that at my job, but it’s a first-class/normal citizen in the DOM, using normal json-object string field entries named “description”. That way users can set descriptions for config objects regardless whether they edit config files by hand, or use a GUI, or whatever.

-hadriel


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Mere moments ago, quoth I:

> On 23/09/2020 12:52, Vinnie Falco wrote:
>> It is the opposite of SRP violation. basic_parser is concerned only
>> with parsing and not keeping track of the full value. The
>> responsibility of tracking full values is moved to the caller. That
>> way, basic_parser does not impose its opinions on how the full value
>> should be stored. This was litigated in earlier posts if I recall
>> (Peter?).
>
> Except it does, because it parses into int/uint/double, which is
> inherently a storage decision.  You might have a valid argument if it
> left that step to the handler to determine as well.  (i.e. only
> providing on_number_part and on_number, just the text without any
> attempt to parse into a numeric type.)

It's not just a storage *decision*; in order to implement this interface
with the observed behaviour it also means that basic_parser must
currently actually have *storage* for partial numbers updated across
multiple fragments.  This is inconsistent with how it treats keys and
strings.

Hence I reiterate: it would be more consistent to delete those methods
and only have on_number_part + on_number that provide just the text.
(Make it parser's responsibility to do the actual text-to-numeric-type
parsing, ideally via a helper class that user code can call too.  Since
it's the layer that's dictating the storage format via json::value.)

> The current form of basic_parser could still exist, if you like, but it
> should be made an implementation detail and the complete-value
> basic_parser actually exposed to consumers.

I still think that it would be better to instead provide an interface
that precomposes the text fragments, but that is a separate argument
from the prior one, that can be treated separately.

Perhaps you could provide both -- basic_parser as-is (but with the
number parsing/storage removed), and another layer on top
(custom_parser, perhaps) that is nearly identical but omits the _part
methods to only provide precomposed keys/values.  This is the one that
most consumers of a SAX-style interface would be expected to actually use.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 23/09/2020 13:56, Hadriel Kaplan wrote:
> Out of curiosity, why would you want to round-trip the comments from a config file?
>
> Wouldn’t you want to write the comments anew each time you write the file, in case your code version had newer or corrected info? (I’m assuming the comments are like “// the following setting enables self-destruct (default=false)”)
>
> I would think you’d want such “comments” to come from a schema, not the DOM. (Of course that would require schema-based serialization, or some form of serialization control, but that’s a separate topic)

No, I wasn't thinking of comments as software-provided documentation.
But that could still work if the serialiser set the comment before
writing the file.  Though I guess it'd have to be more careful if there
were human-specified comments as well.

> Or do you want the human to be able to change comments and/or write their own, that get saved back? We have something like that at my job, but it’s a first-class/normal citizen in the DOM, using normal json-object string field entries named “description”. That way users can set descriptions for config objects regardless whether they edit config files by hand, or use a GUI, or whatever.

Yes, I was envisaging the human to be the one creating the comments,
either as notes for themselves, or more commonly commenting out a
particular setting because they don't want it to take effect but don't
want to actually delete it.

You can work around these things in strict JSON by altering the key name
(or using a reserved key name, as you suggested), and that's fine,
provided that it round-trips unknown keys.  But it's more awkward than
comments.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 22, 2020 at 7:48 PM Gavin Lambert via Boost <
[hidden email]> wrote:

> On 23/09/2020 05:55, Vinnie Falco wrote:
> >> I also don't see the point of supplying a size argument to on_key* and
> >> on_string, as this can be read from the string_view argument instead.
> >
> > Right, so this size argument is actually the total number of
> > characters, while the string_view::size is the number of characters in
> > the current piece.
>
> Which is a poor design choice, as I said.
>

How? The goal is to provide as much information as possible to the handler.
Forcing users to track it themselves would introduce redundancy.

> I don't really understand why.  basic_parser doesn't actually store any
> > of the keys/values (especially as it doesn't merge value parts), so it
> > should have no reason to impose any kind of limit.  This just seems like
> > it's doing an extra count/comparison that could be removed entirely, and
> > removing it would naturally improve performance.
>

basic_parser only checks the limit; it does not enforce it. The handler is
responsible for determining what that limit is. I suppose that we could add
a little TMP that elides the check statically, but I don't see too much of
a point in doing so.

On Tue, Sep 22, 2020 at 7:48 PM Gavin Lambert via Boost <
[hidden email]> wrote:

> On 23/09/2020 05:55, Vinnie Falco wrote:
> > On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert wrote:
> >> Regarding the SAX parser specifically, I dislike the *_part methods in
> >> the callback interface.  I feel that these should be removed and only
> >> the final form retained, called only when the complete text of the value
> >> has been determined.
> >
> > The parser (and serializer) support incremental / streaming operation.
> > To provide the "complete" text would require allocating memory and
> > saving the partial result until enough input has been supplied to
> > finish the text. basic_parser does not currently allocate memory,
> > which was a design goal, and this would change that.
>
> Not necessarily.  A small buffer can be statically allocated for a
> single value-in-progress.  It's trivial for all numeric values (except
> perhaps pathological precision, but that can be ignored).  It's harder
> for string values; that might have to support dynamic allocation, it's
> true.
>
> I just don't see the point in having an interface that can return half
> of a value -- this just moves the problem of allocation and keeping
> track of the full value to the parser handler, which seems like an SRP
> violation.
>
> >> I don't feel it is ever sensible to call back with
> >> a partially-parsed value (unless I am misunderstanding what these are
> >> for, but there is little explanation in the docs, and I wasn't able to
> >> get them called at all in my test).
> >
> > In order to see partial results, you have to invoke the parser with
> > partial JSON. In other words, break up the input JSON into two or more
> > consecutive buffers of characters, and supply them in order.
>
> I tried that, but the parse simply failed at the end of the first
> partial buffer and it completely ignored the second buffer.  Hence my
> statement that basic_parser did not operate incrementally.  Though see
> below; this appears to be a failure of example rather than of
> implementation.
>
> >> I also don't see the point of supplying a size argument to on_key* and
> >> on_string, as this can be read from the string_view argument instead.
> >
> > Right, so this size argument is actually the total number of
> > characters, while the string_view::size is the number of characters in
> > the current piece.
>
> Which is a poor design choice, as I said.
>
> >> parser.write() is incremental but basic_parser.write() is not, which
> >> seems odd.  I assume this means that parser is storing an incrementally
> >> populated buffer of the full document rather than _really_ incrementally
> >> parsing.
> >
> > Everything is incremental. parser::write() has the condition that if
> > there are any leftover characters after receiving a complete JSON, an
> > error is returned. Contrast this with parser::write_some which does
> > not produce an error in this case. basic_parser::write is more similar
> > to parser::write_some. We will be renaming it to
> > basic_parser::write_some, as another reviewer suggested, to make this
> > more clear. There is also a synergy with asio's naming with that
> > change.
>
> I wrote a print_parser (which is just like the null_parser example but
> prints the method calls and parameters) and invoked it like so:
>
>         json::error_code ec;
>         print_parser parser;
>         std::string data1 = R"j({"foo":14,"bar":1)j";
>         std::string data2 = R"j(8,"baz":"bo)j";
>         std::string data3 = R"j(b"})j";
>         parser.write(&data1[0], data1.size(), ec);
>         parser.write(&data2[0], data2.size(), ec);
>         parser.write(&data3[0], data3.size(), ec);
>
> It stopped after calling on_int64(1, "1") and did not continue parsing,
> completely ignoring the second buffer, never calling any _part methods
> and reporting the wrong integer value.  It parses correctly only if both
> are supplied.  Hence, as I said, it is not behaving incrementally.
>
> However I have since noticed that the missing link is that it must pass
> 'true' as the 'more' parameter to basic_parser.write, which is not
> demonstrated in any example; it may be helpful to show that.  (In fact
> this resolves my other concern about multi-document parsing as well,
> which also could use an explicit example.)
>
> Once this is done, then (as expected by the docs) it reports these calls
> (in part):
>
>      on_number_part("1")
>      on_int64(18, "8")
>
>      on_string_part("bo", 2)
>      on_string("b", 3)
>
> So I reiterate that the number and string handling is inconsistent and
> these part methods are more confusing than helpful -- numbers are parsed
> into the complete value but strings are not, and neither give you the
> full text of the value (which may be important if the goal is to parse
> into an arbitrary-precision value type).
>
> I do regard this as a very poor design choice, but not sufficiently so
> to reject the library.
>
> (I still also regard SAX-style parsing itself as a poor design choice
> [vs pull parsing], due to poor composability, but that is not a failure
> of the library itself, except perhaps in offering it at all.  But I can
> understand why.)
>
> > Well, yeah, that's exactly what it is. Consider a network server that
> > delivers JSON to clients. We don't want a completion handler to be
> > stuck serializing 100MB of JSON while another client is waiting who
> > only needs 1MB serialized. The way to ensure fairness here is to
> > serialize incrementally with some reasonable buffer size, say 32kb.
> > The light client will finish in ~33 i/o cycles while the heavy client
> > still makes progress.
>
> I was assuming that a "deep" JSON (many nested objects) may have a
> different cost per output character than a wide but shallow one.  But
> it's not a big deal.
>
> > To store the comments would mean storing them in the DOM, which would
> > have a significant impact on performance as the space for everything
> > is tightly controlled.
>
> I don't see why.  It's just an extra string associated with any value.
>
> FWIW, I don't think it needs to *precisely* round-trip the comments --
> it's ok to convert comments to /**/ style (that's mandatory if the
> output isn't pretty-printed) and to consolidate all comments associated
> with one value to a single comment that's always output after that value
> (or perhaps have 'before' and 'after' slots).  This should be much more
> manageable.
>
> As it stands, the library is completely useless for the case of writing
> a JSON config file intended for human use.  It's ok if you don't want
> that to be a goal, but this should be made clear up front so that people
> don't waste time considering the library for use cases it's not intended
> for.  (Though it would be preferable if you did support it.)
>
> >> The basic_parser examples do not compile out of the box -- with the
> >> suggested json.hpp and src.hpp includes there are link errors with
> >> basic_parser symbols.  Additionally including basic_parser.hpp resolves
> >> these but starts complaining about missing
> >> max_{key,string,array,object}_size members of the handler type, which
> >> are not documented.  After these are supplied then it works as expected.
> >>    I don't see why these should be needed, however.
> >
> > Those max sizes should be documented in the handler exemplar:
> >
> > <
> https://master.json.cpp.al/json/ref/boost__json__basic_parser.html#json.ref.boost__json__basic_parser.handler0
> >
>
> The docs I am referring to are the ones referenced in the original
> pre-announcement post (although now I notice that the main announcement
> post has a different link):
>
>
>
> http://vinniefalco.github.io/doc/json/json/ref/boost__json__basic_parser.html
>
> This is also where the "using numbers" section is entirely blank.
>
> > The reason they are required is for optimization. Handling the limit
> > in the basic_parser leads to performance gains.
>
> I don't really understand why.  basic_parser doesn't actually store any
> of the keys/values (especially as it doesn't merge value parts), so it
> should have no reason to impose any kind of limit.  This just seems like
> it's doing an extra count/comparison that could be removed entirely, and
> removing it would naturally improve performance.
>
> Unless you're saying that moving the check to basic_parser improves the
> performance of parser, in which case: you're doing it wrong.  Imposing
> restrictions on all users of basic_parser for something that only parser
> cares about is the wrong design choice.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Le mercredi 23 septembre 2020 à 14:08 +1200, Gavin Lambert via Boost a
écrit :

> Mere moments ago, quoth I:
> > On 23/09/2020 12:52, Vinnie Falco wrote:
> > > It is the opposite of SRP violation. basic_parser is concerned
> > > only
> > > with parsing and not keeping track of the full value. The
> > > responsibility of tracking full values is moved to the caller.
> > > That
> > > way, basic_parser does not impose its opinions on how the full
> > > value
> > > should be stored. This was litigated in earlier posts if I recall
> > > (Peter?).
> >
> > Except it does, because it parses into int/uint/double, which is
> > inherently a storage decision.  You might have a valid argument if
> > it
> > left that step to the handler to determine as well.  (i.e. only
> > providing on_number_part and on_number, just the text without any
> > attempt to parse into a numeric type.)
>
> It's not just a storage *decision*; in order to implement this
> interface
> with the observed behaviour it also means that basic_parser must
> currently actually have *storage* for partial numbers updated across
> multiple fragments.  This is inconsistent with how it treats keys
> and strings.

The base inconsistency is the fact that storing a number, whatever
large it is, takes a fixed amount of storage (talking about doubles or
longs, which are the only types handled by boost.json), whereas for a
string it is not the case. A non-allocating parser can safely handle
integers and double itself (and it can do it in a more efficient way
than if it's done in an upper layer), but has no other choice than
splitting strings (unless there is an upper limit on string size).

> I still think that it would be better to instead provide an
> interface
> that precomposes the text fragments, but that is a separate argument
> from the prior one, that can be treated separately.
>
> Perhaps you could provide both -- basic_parser as-is (but with the
> number parsing/storage removed), and another layer on top
> (custom_parser, perhaps) that is nearly identical but omits the
> _part
> methods to only provide precomposed keys/values.  This is the one
> that
> most consumers of a SAX-style interface would be expected to actually
> use.

From what i've seen, such an interface would be quite easy to
implement. That would break the purpose of the non-allocating parser,
though, so i'm not sure it would be that useful.

Regards,
Julien


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
On 23/09/2020 18:31, Julien Blanc wrote:
> The base inconsistency is the fact that storing a number, whatever
> large it is, takes a fixed amount of storage (talking about doubles or
> longs, which are the only types handled by boost.json), whereas for a
> string it is not the case. A non-allocating parser can safely handle
> integers and double itself (and it can do it in a more efficient way
> than if it's done in an upper layer), but has no other choice than
> splitting strings (unless there is an upper limit on string size).

I'm aware of that (although it's not entirely true).  My point is that
it's the wrong thing to do, especially given that JSON technically
allows for arbitrarily large numbers (even though is typically more
limited in practice).

This could be avoided if basic_parser didn't try to dictate the number
storage format and treated these the same as strings.  That would allow
both for arbitrary length/precision numbers to be handled as well as
embedded clients that want to use 16-bit integer storage rather than
64-bit, for example.  And it'd be more consistent with how keys and
strings are handled.

Let the class that actually does want to dictate the storage format
(i.e. json::parser, which dictates use of json::value) do so, and keep
that away from json::basic_parser.  It just makes sense.

And if anything, it should improve performance (by not doing it when not
needed); I don't see why you think it'd be less efficient in an upper layer.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 23/09/2020 15:45, Krystian Stasiowski wrote:

>> On 23/09/2020 05:55, Vinnie Falco wrote:
>>>> I also don't see the point of supplying a size argument to on_key* and
>>>> on_string, as this can be read from the string_view argument instead.
>>>
>>> Right, so this size argument is actually the total number of
>>> characters, while the string_view::size is the number of characters in
>>> the current piece.
>>
>> Which is a poor design choice, as I said.
>
> How? The goal is to provide as much information as possible to the handler.
> Forcing users to track it themselves would introduce redundancy.

That was in the context of stating that requiring that users of
basic_parser need to self-assemble multiple fragments of text was a poor
design choice, vs. handing them the complete preassembled value text.

I've walked back on that one a little bit; I now think that it's ok for
basic_parser to do that (to retain a zero-allocation guarantee) but in
that case it should treat numbers the same way (which it currently
doesn't), and ideally there should be a standard slightly-higher-level
parser that has a similar interface but does preassemble the values,
eliminating the _part callbacks (for people who prefer ease of use over
raw performance).

>> I don't really understand why.  basic_parser doesn't actually store any
>> of the keys/values (especially as it doesn't merge value parts), so it
>> should have no reason to impose any kind of limit.  This just seems like
>> it's doing an extra count/comparison that could be removed entirely, and
>> removing it would naturally improve performance.
>
> basic_parser only checks the limit; it does not enforce it. The handler is
> responsible for determining what that limit is. I suppose that we could add
> a little TMP that elides the check statically, but I don't see too much of
> a point in doing so.

I don't see the point in checking the limit in the first place.  Either
the handler cares, in which case it can easily count calls to value
callbacks within an on_array_begin/on_array_end block itself (or just
check the count at the end), or it doesn't care, in which case it's a
waste of time for basic_parser to check.

It also assumes that all parts of the object tree have the same limits,
which seems odd (in the case where something does care).  I guess the
main issue is that nowhere in the docs can I find any justification for
their existence or why you might ever want to set them.

Overall, basic_parser itself feels more like an interface that fell out
of json::parser rather than one that was deliberately designed on its own.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
śr., 23 wrz 2020 o 02:53 Vinnie Falco via Boost <[hidden email]>
napisał(a):

> On Tue, Sep 22, 2020 at 4:47 PM Gavin Lambert via Boost
> <[hidden email]> wrote:
> > I just don't see the point in having an interface that can return half
> > of a value -- this just moves the problem of allocation and keeping
> > track of the full value to the parser handler, which seems like an SRP
> > violation.
>
> It is the opposite of SRP violation. basic_parser is concerned only
> with parsing and not keeping track of the full value. The
> responsibility of tracking full values is moved to the caller. That
> way, basic_parser does not impose its opinions on how the full value
> should be stored. This was litigated in earlier posts if I recall
> (Peter?).
>
> > >> I don't feel it is ever sensible to call back with
> > >> a partially-parsed value (unless I am misunderstanding what these are
> > >> for, but there is little explanation in the docs, and I wasn't able to
> > >> get them called at all in my test).
>
> The interfaces are built up in layers. At the lowest layer you have
> basic_parser which concerns itself only with processing the input
> buffer and turning it into semantic components. It does _not_ have the
> responsibility of buffering input to present it as a larger component,
> that's the job of the handler. basic_parser is exposed to users so
> they can use their own schemes for dealing with the semantic
> components.
>
> Above basic_parser, you have parser which uses value_stack. The
> value_stack does take responsibility for buffering input, and
> producing DOM values. As with basic_parser, both of these components
> are exposed to users so they can write their own layers on top.
>
> On top of parser we have the parse() free functions, which address the
> most popular use-case of parsing. These functions will satisfy most
> needs. And for the needs that are not satisfied, users can easily
> implement their own parse functions with different behaviors, as the
> library exposes all the intermediate lower levels.
>

Maybe this explanation is worth putting in the documentation.

Regards,
&rzej;


> > > Right, so this size argument is actually the total number of
> > > characters, while the string_view::size is the number of characters in
> > > the current piece.
> >
> > Which is a poor design choice, as I said.
>
> No, the information is already there so we are just passing it to the
> handler which can use it, or not. If the handler doesn't use it, then
> it is optimized away.
>
> >         parser.write(&data1[0], data1.size(), ec);
> >         parser.write(&data2[0], data2.size(), ec);
> >         parser.write(&data3[0], data3.size(), ec);
>
> You have to call basic_parser::reset in order to start parsing a new
> "document."
>
> > So I reiterate that the number and string handling is inconsistent and
> > these part methods are more confusing than helpful -- numbers are parsed
> > into the complete value but strings are not, and neither give you the
> > full text of the value (which may be important if the goal is to parse
> > into an arbitrary-precision value type).
> >
> > I do regard this as a very poor design choice, but not sufficiently so
> > to reject the library.
>
> No it is not a poor design choice, it is the correct design choice. If
> basic_parser were to buffer the input, then users who have no need of
> buffered input (such as value_stack) would be paying for it for
> nothing. This interface is driven first and foremost for performance
> and the incremental/streaming functionality. If callers want to see
> full text, they can append the partial strings into a data member
> until the final on_string or on_int64 / on_uint64 / on_double and then
> interact with it that way. But they are not obliged to do so.
>
> Again I repeat, this is not a poor design choice. json::string for
> example is perfectly capable of being constructed from two separate
> buffers of characters (an implementation detail):
>
> <
> https://github.com/CPPAlliance/json/blob/6ddddfb16f9b54407d153abdda11a29f74642854/include/boost/json/impl/value_stack.ipp#L428
> >
>
> If we were to follow your "correct design" to make basic_parser buffer
> the input, the implementation would be needlessly allocating memory
> and copying characters.
>
> > I was assuming that a "deep" JSON (many nested objects) may have a
> > different cost per output character than a wide but shallow one.  But
> > it's not a big deal.
>
> Yes, it isn't perfect of course but over time and with a reasonable
> distribution of possible JSON inputs it averages out to be good
> enough. Certainly better than not having streaming at all (which would
> be terrible).
>
> > > To store the comments would mean storing them in the DOM, which would
> > > have a significant impact on performance as the space for everything
> > > is tightly controlled.
> >
> > I don't see why.  It's just an extra string associated with any value.
>
> Yes exactly. sizeof(value)==16 on 32bit platforms, and
> sizeof(value)==24 on 64-bit platforms. When arrays and objects are
> constructed during parsing, the implementation performs a memcpy after
> allocation. Every additional byte in the sizeof(value) slows down the
> creation of arrays and objects. Adding a pointer to value would cost
> 25% on 32-bit and 33% on 64-bit, and make Boost.JSON unable to ever
> beat RapidJSON in performance.
>
> > This just seems like
> > it's doing an extra count/comparison that could be removed entirely, and
> > removing it would naturally improve performance.
>
> If you don't want a limit then just set the max to `size_t(-1)` and
> the compiler will elide the check completely.
>
> > Unless you're saying that moving the check to basic_parser improves the
> > performance of parser, in which case: you're doing it wrong.  Imposing
> > restrictions on all users of basic_parser for something that only parser
> > cares about is the wrong design choice.
>
> The design of these components balances usability with performance. If
> you don't care about limits you can just set the max to size_t(-1).
> And I think this is the right choice. The library makes common things
> easy, e.g. call parse(). If you need more complexity, you instantiate
> a `parser` whose API requires a few more steps. If you really want to
> go low level, you use `basic_parser` which being low level can seem
> esoteric.
>
> And I think this is okay, because the percentage of users who will use
> basic_parser is incredibly tiny. The percentage of `parser` users much
> greater. While the percentage of users of the parse() free function
> will be closer to 100% than anything else. In other words the library
> favors performance and utility at the lowest levels over theoretical
> interface correctness, where the number of users is naturally smaller
> (and their domain-expertise naturally higher).
>
> Regards
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Le 2020-09-23 10:17, Gavin Lambert via Boost a écrit :
> On 23/09/2020 18:31, Julien Blanc wrote:

> And if anything, it should improve performance (by not doing it when
> not needed); I don't see why you think it'd be less efficient in an
> upper layer.

The number would need to be parsed twice: first in the json parser, to
check it is a valid number, secondly in the handler, to give an actual
number. Given that some json parsing libraries implements their own
number parsing to avoid this double parsing, constructing the value on
the fly, i’m not alone in thinking it makes a difference (from what i've
seen, boost::json authors also are on that boat, which makes their
interface choice a logical one).

Regards,

Julien

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Wed, Sep 23, 2020 at 1:40 AM Gavin Lambert via Boost
<[hidden email]> wrote:
> I've walked back on that one a little bit; I now think that it's ok for
> basic_parser to do that (to retain a zero-allocation guarantee) but in
> that case it should treat numbers the same way (which it currently
> doesn't), and ideally there should be a standard slightly-higher-level
> parser that has a similar interface but does preassemble the values,
> eliminating the _part callbacks (for people who prefer ease of use over
> raw performance).

I had that originally, it was called number_parser, it was a public
interface, and it was one of the first things to go during our 3-month
journey of optimization. You can fish it out of the commit log if you
care to see it.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
At 00:22 24/09/2020, Vinnie Falco wrote:
 >On Wed, Sep 23, 2020 at 1:40 AM Gavin Lambert wrote:
 >> I've walked back on that one a little bit; I now think that
it's
 >ok for
 >> basic_parser to do that (to retain a zero-allocation
guarantee)
 >but in
 >> that case it should treat numbers the same way (which it
 >currently
 >> doesn't), and ideally there should be a standard
 >slightly-higher-level
 >> parser that has a similar interface but does preassemble the
 >values,
 >> eliminating the _part callbacks (for people who prefer ease of

 >use over
 >> raw performance).
 >
 >I had that originally, it was called number_parser, it was a
 >public
 >interface, and it was one of the first things to go during our
 >3-month
 >journey of optimization. You can fish it out of the commit log
if
 >you
 >care to see it.

I didn't fish out the code, but given the class name, that sounds
like the complete opposite of what I was suggesting.

To restate again, my suggestion is:

1. Remove on_int64/uint64/double from basic_parser's handler type.
2. Add on_number(string_view, size_t) to basic_parser's handler
type -- having exactly the same semantics as on_key/string and
their _part companions.
3. Add custom_parser, which wraps a basic_parser but has internal
(dynamically allocated if needed) storage for partial
keys/strings/numbers so it never calls _part methods on its own
handler type.  It still uses on_number(string_view) but note that
this (and on_key/string) have no reason to pass a size_t any more,
because it can never be any different from the size() of the
string_view.
4. The int64/uint64/double parsing code that used to be in
basic_parser should be moved to a dedicated helper class, in turn
used by parser and *optionally* used by the user's handler
implementation for basic_parser/custom_parser.

This design has the proper separation of concerns between the
parser layers (not dictating a numeric storage format, and keeping
number parsing separate from JSON structural parsing).  It should
not negatively impact performance of parser, and should positively
affect performance of basic/custom_parser consumers who didn't
want the default conversion anyway.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vinnie Falco wrote:

> I had that originally, it was called number_parser, it was a public
> interface, and it was one of the first things to go during our 3-month
> journey of optimization.

I suspect that something similar would have happened to the hypothetical
pull parser lowest level, if there were one. It seems to me that the pull
parser would be very similar to the current parser, except it would need to
suspend on each token, instead of on each buffer boundary. This probably has
a performance cost.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][json] My review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Wed, Sep 23, 2020 at 7:21 AM Gavin Lambert via Boost
<[hidden email]> wrote:
> given the class name, that sounds
> like the complete opposite of what I was suggesting.
> ...
> 4. The int64/uint64/double parsing code that used to be in
> basic_parser should be moved to a dedicated helper class, in turn
> used by parser and *optionally* used by the user's handler

Yes, you have described number_parser. Again.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12