[review][JSON] json::value as a vocabulary type

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
Hi Everyone,
I have heard this claim a number of times: json::value is suitable for a
vocabulary type. I am not sure what that actually means: no templates?
Guarantee that layout or mangled symbol will never change? Anything else?

My understanding of a "vocabulary type" is that it should be usable (not
necessarily with maximum efficiency) for *any* usage. In the case of JSON
that would mean that I should be able to represent any value that
corresponds to a valid JSON when converted to text. I do not think that
json::value can claim that without the ability to serialize arbitrarily big
numbers.

I understand that the goal of the library is to address the most common
cases, and big numbers do not fall into this category. I am just saying
that the name "vocabulary type" may not be accurate here.

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
> Gesendet: Donnerstag, 24. September 2020 um 11:57 Uhr
> Von: "Andrzej Krzemienski via Boost" <[hidden email]>
>
> Hi Everyone,
> I have heard this claim a number of times: json::value is suitable for a
> vocabulary type. I am not sure what that actually means: no templates?
> Guarantee that layout or mangled symbol will never change? Anything else?

I'm sure there exist different interpretations of that term, but the most
important aspect for me has *nothing* to do with implementation stability,
but that it is the common way to represent some type of information
in the various interfaces across the larger c++ eco system.
I.e. it is part of the common vocabulary used. E.g. std::string_view
can be considered the vocabulary type for string parameters, and even
if I use my custom string implementation internally for some reason,
I'll make dam sure that my interface accepts std::string_view and returns
something that can at least be implicitly be converted to std::string_view.

Having such common vocabulary types that represent more complex
data than just numbers and strings could greatly facilitate the integration
and composition of mutliple different libraries, because I don't have to
translate the data from the "language" spoken by lib Foo to
the "language" used by lib Bar in order to hand the output from one
to the other.

E.g. we unfortunately don't have a real vocabulary type for vector data
(in the mathematical sense). So, if I want to use GLM to do some linear
algebra computation and then display the result in a Qt GUI,
I'll - at some point - have to translate from glm::vec2 to QVector2D,
or some such.  If both libs would use the same vocabulary (at least
in their interface), like a std::vec2, I could just pass the result from
the glm computation directly to my GUI code without translation
(and the associated danger of introducing bugs or loosing data).

One of the big strengths of c++ is the ability to create data types that
"feel" the same as native types (a.k.a value-types), but ever time we
want to hand data from one library to another we either have to
decompose the data types to their fundamental components (and even
strings can not always be forwarded directly) or write everything as a
template.

Whether c++ is in need of a JSON vocabulary type and if Boost.JSON
does provide a good one is a question I unfortunately can't answer yet
(otherwise I'd have written a review), but imho the worth of that library
should not just be measured by whether it is suited as a general
vocabulary type for the whole c++ eco system, but if it provides a
sound (doesn't necessarily need to be optimal) basis for building
higher level libs on top of it in the future (inside and outside of boost).
(e.g. implementing JSON based internet protocols).


Best

Mike

P.S.: one word about templates as vocabulary types:
The danger is that you effectively get not one type to represent
e.g. JSON data, but one for each possible instantiation and you are
back to square one (think about std::string, vs std::wstring, vs
std::u8string). For vector data on the other hand, 2D and 3D
vectors represent different kinds of data, so having different types
is OK and using a single template instead of repeating the same logic
N times makes sense. The chrono duration types are a bit in-between
in this regard as there are many different types, but at least conversion
between them is relatively easy or even implicit.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Sep 24, 2020 at 2:58 AM Andrzej Krzemienski via Boost
<[hidden email]> wrote:
> My understanding of a "vocabulary type" is that it should be usable (not
> necessarily with maximum efficiency) for *any* usage. In the case of JSON

When I use the term I refer to the ability to build higher level
abstractions. Here's a perfect example:

<https://github.com/arun11299/cpp-jwt>

This library implements RFC-7519 and uses objects of type
nlohmann::json in its public interface.

I argue that boost::json::value would be a superior type to what this
library currently uses. That is what is meant when Boost.JSON claims
to be a "vocabulary type." It certainly does not mean that arbitrary
precision numbers are supported, that every possible use-case is
supported, or that it can store any payload with perfect fidelity.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

> On Sep 24, 2020, at 7:06 AM, Mike via Boost <[hidden email]> wrote:
>
> Whether c++ is in need of a JSON vocabulary type and if Boost.JSON
> does provide a good one is a question I unfortunately can't answer yet
> (otherwise I'd have written a review), but imho the worth of that library
> should not just be measured by whether it is suited as a general
> vocabulary type for the whole c++ eco system, but if it provides a
> sound (doesn't necessarily need to be optimal) basis for building
> higher level libs on top of it in the future (inside and outside of boost).
> (e.g. implementing JSON based internet protocols).

This is anecdotal, of course, but within my company’s codebase the equivalent JSON-based variant structure is indeed used as a vocabulary type and passed between libraries - although of course they’re our own libraries, so it’s not really what you mean. It’s extremely convenient and its usage has become somewhat viral.

We use Facebook’s `folly::dynamic` for that variant type today, and out of an average size (1M+ LOC) code base, the string “folly::dynamic” appears over 7,600 times. A lot of that usage is in unit test code+libraries, where we use the type for various purposes, but a lot of it is also in production code. It is *not* only used for when we need parsing or serialization to/from JSON, although certainly that’s a big usage too; and makes it even more convenient as a value type because we can serialize it to logs for debugging, or parse from strings/files for unit testing library APIs.

Of course the downside with using such a dynamically-typed structure as a vocab type is it it won’t be as efficient as statically-typed ones, and if you put the wrong stuff in it you won’t get compile-time failures. But that’s an acceptable trade-off for some people/use-cases.

-hadriel


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost
<[hidden email]> wrote:

> My understanding of a "vocabulary type" is that it should be usable (not
> necessarily with maximum efficiency) for *any* usage. In the case of JSON
> that would mean that I should be able to represent any value that
> corresponds to a valid JSON when converted to text. I do not think that
> json::value can claim that without the ability to serialize arbitrarily big
> numbers.

I fully agree with this statement.
json::value *needs* to support arbitrary numbers. It's incomplete without it.
Maybe the author of multiprecision can advise on the best type to use
there (gmp or mpfr?).

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost
<[hidden email]> wrote:

>
> On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost
> <[hidden email]> wrote:
>
> > My understanding of a "vocabulary type" is that it should be usable (not
> > necessarily with maximum efficiency) for *any* usage. In the case of JSON
> > that would mean that I should be able to represent any value that
> > corresponds to a valid JSON when converted to text. I do not think that
> > json::value can claim that without the ability to serialize arbitrarily big
> > numbers.
>
> I fully agree with this statement.
> json::value *needs* to support arbitrary numbers. It's incomplete without it.
> Maybe the author of multiprecision can advise on the best type to use
> there (gmp or mpfr?).

This is not a reasonable requirement.  std::string is the canonical
C++ vocabulary type.  On 32-bit systems, it cannot represent 5GB-long
strings.  Depending on platform limitations, it usually cannot even
represent more than 2GB-long strings.  Computers are limited to finite
resources.  Putting finite limits on the representation of all kinds
of values is normal, not unexpected -- this is especially true of
numeric values.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
pt., 25 wrz 2020 o 15:10 Zach Laine via Boost <[hidden email]>
napisał(a):

> On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost
> <[hidden email]> wrote:
> >
> > On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost
> > <[hidden email]> wrote:
> >
> > > My understanding of a "vocabulary type" is that it should be usable
> (not
> > > necessarily with maximum efficiency) for *any* usage. In the case of
> JSON
> > > that would mean that I should be able to represent any value that
> > > corresponds to a valid JSON when converted to text. I do not think that
> > > json::value can claim that without the ability to serialize
> arbitrarily big
> > > numbers.
> >
> > I fully agree with this statement.
> > json::value *needs* to support arbitrary numbers. It's incomplete
> without it.
> > Maybe the author of multiprecision can advise on the best type to use
> > there (gmp or mpfr?).
>
> This is not a reasonable requirement.  std::string is the canonical
> C++ vocabulary type.  On 32-bit systems, it cannot represent 5GB-long
> strings.  Depending on platform limitations, it usually cannot even
> represent more than 2GB-long strings.  Computers are limited to finite
> resources.  Putting finite limits on the representation of all kinds
> of values is normal, not unexpected -- this is especially true of
> numeric values.
>

I am wondering. If I have a small web service for generating prime numbers,
and I need to return them in a JSON file, is my only option to pass it as
string?
Prime numbers of this kind are bigger than uint64_t. But they are not as
big as 1MB. Is such a use case for a number so unusual that it cannot be
stored as a JSON number?
Are JSON numbers only good for storing int-based identifiers?

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
On Fri, Sep 25, 2020 at 9:07 AM Andrzej Krzemienski via Boost
<[hidden email]> wrote:

>
> pt., 25 wrz 2020 o 15:10 Zach Laine via Boost <[hidden email]>
> napisał(a):
>
> > On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost
> > <[hidden email]> wrote:
> > >
> > > On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost
> > > <[hidden email]> wrote:
> > >
> > > > My understanding of a "vocabulary type" is that it should be usable
> > (not
> > > > necessarily with maximum efficiency) for *any* usage. In the case of
> > JSON
> > > > that would mean that I should be able to represent any value that
> > > > corresponds to a valid JSON when converted to text. I do not think that
> > > > json::value can claim that without the ability to serialize
> > arbitrarily big
> > > > numbers.
> > >
> > > I fully agree with this statement.
> > > json::value *needs* to support arbitrary numbers. It's incomplete
> > without it.
> > > Maybe the author of multiprecision can advise on the best type to use
> > > there (gmp or mpfr?).
> >
> > This is not a reasonable requirement.  std::string is the canonical
> > C++ vocabulary type.  On 32-bit systems, it cannot represent 5GB-long
> > strings.  Depending on platform limitations, it usually cannot even
> > represent more than 2GB-long strings.  Computers are limited to finite
> > resources.  Putting finite limits on the representation of all kinds
> > of values is normal, not unexpected -- this is especially true of
> > numeric values.
> >
>
> I am wondering. If I have a small web service for generating prime numbers,
> and I need to return them in a JSON file, is my only option to pass it as
> string?
> Prime numbers of this kind are bigger than uint64_t. But they are not as
> big as 1MB. Is such a use case for a number so unusual that it cannot be
> stored as a JSON number?
> Are JSON numbers only good for storing int-based identifiers?

I don't know what an int-based identifier is, but I do know that the
use cases for machine-representable ints (that is, and int that is the
size of an int, fits in a register, etc.) is >99% and the use cases
for a web service that generates prime numbers is <1%.  That's what
should drive the design.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list


On 9/25/20 4:06 PM, Andrzej Krzemienski via Boost wrote:

> pt., 25 wrz 2020 o 15:10 Zach Laine via Boost <[hidden email]>
> napisał(a):
>
>> On Fri, Sep 25, 2020 at 6:11 AM Mathias Gaunard via Boost
>> <[hidden email]> wrote:
>>>
>>> On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost
>>> <[hidden email]> wrote:
>>>
>>>> My understanding of a "vocabulary type" is that it should be usable
>> (not
>>>> necessarily with maximum efficiency) for *any* usage. In the case of
>> JSON
>>>> that would mean that I should be able to represent any value that
>>>> corresponds to a valid JSON when converted to text. I do not think that
>>>> json::value can claim that without the ability to serialize
>> arbitrarily big
>>>> numbers.
>>>
>>> I fully agree with this statement.
>>> json::value *needs* to support arbitrary numbers. It's incomplete
>> without it.
>>> Maybe the author of multiprecision can advise on the best type to use
>>> there (gmp or mpfr?).
>>
>> This is not a reasonable requirement.  std::string is the canonical
>> C++ vocabulary type.  On 32-bit systems, it cannot represent 5GB-long
>> strings.  Depending on platform limitations, it usually cannot even
>> represent more than 2GB-long strings.  Computers are limited to finite
>> resources.  Putting finite limits on the representation of all kinds
>> of values is normal, not unexpected -- this is especially true of
>> numeric values.
>>
>
> I am wondering. If I have a small web service for generating prime numbers,
> and I need to return them in a JSON file, is my only option to pass it as
> string?
> Prime numbers of this kind are bigger than uint64_t. But they are not as
> big as 1MB. Is such a use case for a number so unusual that it cannot be
> stored as a JSON number?


If you have to interoperate with Javascript at some point, then I think the
answer is yes, use a string.  JS only knows about double (not even int64 or
uint64).



> Are JSON numbers only good for storing int-based identifiers?
>
> Regards,
> &rzej;
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
>> I am wondering. If I have a small web service for generating prime numbers,

>> and I need to return them in a JSON file, is my only option to pass it as
>> string?
>> Prime numbers of this kind are bigger than uint64_t. But they are not as
>> big as 1MB. Is such a use case for a number so unusual that it cannot be
>> stored as a JSON number?
>> Are JSON numbers only good for storing int-based identifiers?
> I don't know what an int-based identifier is, but I do know that the
> use cases for machine-representable ints (that is, and int that is the
> size of an int, fits in a register, etc.) is >99% and the use cases
> for a web service that generates prime numbers is <1%.  That's what
> should drive the design.
To add to that: JSON is essentially JavaScript based. JavaScript had
long time no ints, only doubles. And the new BigInt can't be converted
to JSON

So to answer the question: Yes your only option is to pass it as string.
Otherwise it is foremost non-portable.




_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Em sex., 25 de set. de 2020 às 11:12, Zach Laine via Boost
<[hidden email]> escreveu:
> I don't know what an int-based identifier is, but I do know that the
> use cases for machine-representable ints (that is, and int that is the
> size of an int, fits in a register, etc.) is >99% and the use cases
> for a web service that generates prime numbers is <1%.  That's what
> should drive the design.

The designs are not conflicting at all. std::string may be the
vocabulary type, but it's only a typedef for std::basic_string.
json::basic_value could also exist. Their implementations don't need
to be shared (so it wouldn't conflict with the performance claims).

Having said that, I do find the int64/uint64/double choices the right
ones (for json::value). They aren't choices to learn about in the JSON
spec, but outside. That's a discussion that I try to avoid for a
number of reasons.

My 2 cents.


--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Andrzej Krzemienski wrote:

> My understanding of a "vocabulary type" is that it should be usable (not
> necessarily with maximum efficiency) for *any* usage.

This is not at all what a vocabulary type is. A vocabulary type is a type
via which two libraries can communicate, without that type being defined by
either of them. E.g. std::size_t is a vocabulary type. It's obviously not
usable for *any* usage.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost
<[hidden email]> wrote:
> Are JSON numbers only good for storing int-based identifiers?

The JSON specification is silent on the limits and precision of the
range of numbers. All that we know is that it is a "light-weight data
interchange format." However, we can gather quite a bit of anecdotal
evidence simply by looking at the various languages which have
built-in support for JSON.

From RFC7159 (https://tools.ietf.org/html/rfc7159)

   This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
   generally available and widely used, good interoperability can be
   achieved by implementations that expect no more precision or range
   than these provide, in the sense that implementations will
   approximate JSON numbers within the expected precision.  A JSON
   number such as 1E400 or 3.141592653589793238462643383279 may indicate
   potential interoperability problems, since it suggests that the
   software that created it expects receiving software to have greater
   capabilities for numeric magnitude and precision than is widely
   available.

Note the phrase "widely available."

From <https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>

    As a practical matter, Javascript integers are limited to about 2^53
    (there are no integers; just IEEE floats).

From <https://developers.google.com/discovery/v1/type-format>

    ...a 64-bit integer cannot be represented in JSON (since JavaScript
    and JSON support integers up to 2^53).

From <https://github.com/josdejong/lossless-json>

    When to use? Only in some special cases. For example when you
    have to create some sort of data processing middleware which has
    to process arbitrary JSON without risk of screwing up. JSON objects
   containing big numbers are rare in the wild.

From <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number>

    The JavaScript Number type is a double-precision 64-bit binary format
    IEEE 754 value, like double in Java or C#....When parsing data that has
    been serialized to JSON, integer values falling outside of this range can
    be expected to become corrupted when JSON parser coerces them to
    Number type. A possible workaround is to use String instead.

From <https://docs.python.org/3/library/json.html#implementation-limitations>

    When serializing to JSON, beware any such limitations in applications
    that may consume your JSON. In particular, it is common for JSON
    numbers to be deserialized into IEEE 754 double precision numbers
    and thus subject to that representation’s range and precision limitations.

I am actually now starting to wonder if even 64-bit integer support
was a good idea, as it can produce numbers which most implementations
cannot read with perfect fidelity.

It is true that there are some JSON implementations which support
arbitrary-precision numbers, but these are rare and all come with the
caveat that their output will likely be incorrectly parsed or rejected
by the majority of implementations. This is quite an undesirable
feature for an "interoperable, data-exchange format" or a vocabulary
type. Support for arbitrary precision numbers would not come without
cost. The library would be bigger, in a way that the linker can't
strip (because of switch statements on the variant's kind). Everyone
would pay for this feature (e.g. embedded) but only a handful of users
would use it.

There is overwhelming evidence that the following statement is false:

    "json::value *needs* to support arbitrary numbers. It's incomplete
without it."

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list


> On Sep 25, 2020, at 7:05 AM, Mathias Gaunard via Boost <[hidden email]> wrote:
>
> On Thu, 24 Sep 2020 at 10:57, Andrzej Krzemienski via Boost
> <[hidden email]> wrote:
>
>> My understanding of a "vocabulary type" is that it should be usable (not
>> necessarily with maximum efficiency) for *any* usage. In the case of JSON
>> that would mean that I should be able to represent any value that
>> corresponds to a valid JSON when converted to text. I do not think that
>> json::value can claim that without the ability to serialize arbitrarily big
>> numbers.
>
> I fully agree with this statement.
> json::value *needs* to support arbitrary numbers. It's incomplete without it.

Empirical evidence would suggest otherwise. nlohmann, RapidJson, folly::dynamic, etc. do not support that. How can it *need* to support it, when other popular and useful libraries haven’t?

Even in javascript land, while there’s spec support for BigInt as a value within javascript’s language types, there’s no ECMA spec for how to encode or decode it to JSON that I know of. There are several libraries that do custom things to encode/decode BigInts to/from JSON, but none of them are interoperable. Chrome V8 engine has BigInt support I believe, for example, but does not support encoding it to JSON.

If boost.JSON were to choose its own syntax for encoding such things, it wouldn’t be interoperable with anything other than itself for that value.

And if ECMA ever does specify how to encode it, Boost.JSON would have to either change its encoding to match that and thereby break backward-compatibility with previous versions of Boost.JSON... or it would have to offer serialization+parsing options to choose the encoding, which would suck because you’d have to know which JSON encoding style you’re using for any given file/socket.

Regardless, I wouldn’t be surprised if Boost.JSON added support for holding values of larger range/precision a la Boost.Multiprecision someday, but that day does not need to be now, in my opinion.

I fully expect/hope that if Boost.JSON takes off, that more features will be added to it in the future based on demand and submitted PRs.

-hadriel


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, 25 Sep 2020 at 15:17, Alexander Grund via Boost
<[hidden email]> wrote:

> To add to that: JSON is essentially JavaScript based. JavaScript had
> long time no ints, only doubles. And the new BigInt can't be converted
> to JSON
>
> So to answer the question: Yes your only option is to pass it as string.
> Otherwise it is foremost non-portable.

JSON is not JavaScript and JavaScript is not JSON.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
On Fri, Sep 25, 2020 at 11:19 AM Mathias Gaunard via Boost
<[hidden email]> wrote:
> JSON is not JavaScript and JavaScript is not JSON.

JSON literally stands for "JavaScript Object Notation" and while these
two aren't the same, there is certainly a relationship between the two
that must factor into any discussion of its use-cases.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
pt., 25 wrz 2020 o 16:48 Vinnie Falco <[hidden email]> napisał(a):

> On Fri, Sep 25, 2020 at 7:07 AM Andrzej Krzemienski via Boost
> <[hidden email]> wrote:
> > Are JSON numbers only good for storing int-based identifiers?
>
> The JSON specification is silent on the limits and precision of the
> range of numbers. All that we know is that it is a "light-weight data
> interchange format." However, we can gather quite a bit of anecdotal
> evidence simply by looking at the various languages which have
> built-in support for JSON.
>
> From RFC7159 (https://tools.ietf.org/html/rfc7159)
>
>    This specification allows implementations to set limits on the range
>    and precision of numbers accepted.  Since software that implements
>    IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
>    generally available and widely used, good interoperability can be
>    achieved by implementations that expect no more precision or range
>    than these provide, in the sense that implementations will
>    approximate JSON numbers within the expected precision.  A JSON
>    number such as 1E400 or 3.141592653589793238462643383279 may indicate
>    potential interoperability problems, since it suggests that the
>    software that created it expects receiving software to have greater
>    capabilities for numeric magnitude and precision than is widely
>    available.
>
> Note the phrase "widely available."
>
> From <
> https://stackoverflow.com/questions/13502398/json-integers-limit-on-size>
>
>     As a practical matter, Javascript integers are limited to about 2^53
>     (there are no integers; just IEEE floats).
>
> From <https://developers.google.com/discovery/v1/type-format>
>
>     ...a 64-bit integer cannot be represented in JSON (since JavaScript
>     and JSON support integers up to 2^53).
>
> From <https://github.com/josdejong/lossless-json>
>
>     When to use? Only in some special cases. For example when you
>     have to create some sort of data processing middleware which has
>     to process arbitrary JSON without risk of screwing up. JSON objects
>    containing big numbers are rare in the wild.
>
> From <
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number
> >
>
>     The JavaScript Number type is a double-precision 64-bit binary format
>     IEEE 754 value, like double in Java or C#....When parsing data that has
>     been serialized to JSON, integer values falling outside of this range
> can
>     be expected to become corrupted when JSON parser coerces them to
>     Number type. A possible workaround is to use String instead.
>
> From <
> https://docs.python.org/3/library/json.html#implementation-limitations>
>
>     When serializing to JSON, beware any such limitations in applications
>     that may consume your JSON. In particular, it is common for JSON
>     numbers to be deserialized into IEEE 754 double precision numbers
>     and thus subject to that representation’s range and precision
> limitations.
>
> I am actually now starting to wonder if even 64-bit integer support
> was a good idea, as it can produce numbers which most implementations
> cannot read with perfect fidelity.
>
> It is true that there are some JSON implementations which support
> arbitrary-precision numbers, but these are rare and all come with the
> caveat that their output will likely be incorrectly parsed or rejected
> by the majority of implementations. This is quite an undesirable
> feature for an "interoperable, data-exchange format" or a vocabulary
> type. Support for arbitrary precision numbers would not come without
> cost. The library would be bigger, in a way that the linker can't
> strip (because of switch statements on the variant's kind). Everyone
> would pay for this feature (e.g. embedded) but only a handful of users
> would use it.
>
> There is overwhelming evidence that the following statement is false:
>
>     "json::value *needs* to support arbitrary numbers. It's incomplete
> without it."
>

I accidentally replied privately to Vinnie. I am now pasting my reply here:

Thanks. This is a really useful background. This explains why JSON format

> conflates integer and floating point numbers: in fact, originally this was
> only floating point numbers. Number 1 is just a different representation of
> a floating-point number. But if we adapt this view, bearing in mind that
> JavaScript JSON libraries may not be able to parse big uint64_t values,
> indeed Boost.JSON might have made the wrong trade-off by adding support for
> the full range of uint64_t. The cost is: (1) some values generated by
> Boost.JSON cannot be parsed by JavaScript JSON libraries, and (2) the
> complication of the interface (number_cast). And one could say that big
> uint64_t values constitute the 1% of the use cases that are not worth the
> costs.
> On the other hand there is one quire natural use case for the full range
> of uint64_t: hash values: they are naturally stored as size_t and the
> biggest values are equally likely to appear as the smallest. And libraries
> like rapidjson handle this case, so when they are able to serialize it,
> Boost.JSN should be able to parse it. It looks like the two following goals
> are not compatible:
> 1. Parse losslessly every value produced by rapidjson.
> 2. Generate only values parsable by losslessly JavaScript JSON libraries.
>
> So, I guess the choice made in Boost.JSON is the good one. You will
> potentially produce values not parsable by some JSON libraries, and if goal
> 2 is important for some use cases the user has to make sure that she is
> only putting doubles as numbers.
>
> By the way, when I learned about these issues with numbers/doubles, it
> occured to me that Boost.JSON must have somewhere a flaw in handling
> numbers given that it stores three different types and provides equality
> operator. So I tried to break it. And I couldn't. The mechanism for storing
> int, uint and double is very well designed and thought over: that you
> always prefer ints to doubles when parsing, that you always add a comma or
> exponent where serializing floats, that you compare correctly ins with
> uints, and that you always compare ints and floats as unequal. This is
> really consistent. I think it deserves a mention in the documentation.
>

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
On Fri, Sep 25, 2020 at 3:53 PM Andrzej Krzemienski <[hidden email]> wrote:

>> I am actually now starting to wonder if even 64-bit integer support
>> was a good idea, as it can produce numbers which most implementations
>> cannot read with perfect fidelity.
>> ...
> 1. Parse losslessly every value produced by rapidjson.
> 2. Generate only values parsable by losslessly JavaScript JSON libraries.
>
> So, I guess the choice made in Boost.JSON is the good one. You will potentially
> produce values not parsable by some JSON libraries, and if goal 2 is important
> for some use cases the user has to make sure that she is only putting doubles
> as numbers.

Thanks for the kind words.

So, I think at some point I will want to introduce options for
serialization, and one of the options could be the treatment of
integers outside the range ~+/-2^53. We could:

1. serialize them as-is (current implementation)
2. serialize them as the nearest representable IEEE double
3. throw an exception

I know some people might find #3 weird, I'm open to feedback.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
Vinnie Falco wrote:

> So, I think at some point I will want to introduce options for
> serialization, and one of the options could be the treatment of integers
> outside the range ~+/-2^53. We could:
>
> 1. serialize them as-is (current implementation)
> 2. serialize them as the nearest representable IEEE double
> 3. throw an exception

I can't think of a reason to ever prefer #2 over #1. "As is" is already a
legitimate serialization of the nearest representable IEEE double, so #1 is
a valid implementation of #2, except it doesn't needlessly throw away
information.

There's no need to innovate here; we already know that preserving 64 bit
integers is what's useful in practice.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][JSON] json::value as a vocabulary type

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, 25 Sep 2020 at 23:55, Vinnie Falco via Boost
<[hidden email]> wrote:

>
> So, I think at some point I will want to introduce options for
> serialization, and one of the options could be the treatment of
> integers outside the range ~+/-2^53. We could:
>
> 1. serialize them as-is (current implementation)
> 2. serialize them as the nearest representable IEEE double
> 3. throw an exception
>
> I know some people might find #3 weird, I'm open to feedback.

What's the problem with storing it as a string or as an arbitrary
number when it's not representable as int64 or a double?
It doesn't cost anything to do this, it's a pure extension with no
impact on people that don't need it.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12