porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeff Flinn-2
I've ported a classic grammar to qi which was pretty straightforward. At
run-time I'm hitting the BOOST_ASSERT(isascii_(ch)) in isspace when at
the end of the last line in the input with ch ==-1.

I've replace unary operator '!' with '-' for 0 or 1 time matching. I'm
using boost::spirit::qi::parse with const char* iterator types.

I thought I remember reading that some of the semantics of space, blank
and/or eol changed, but can't locate that info. Is there a summary of
these sorts of changes? Or do i need to check
http://www.boost.org/doc/libs/1_52_0/libs/spirit/doc/html/spirit/what_s_new/spirit_2_1.html 
and subsequent change description?

Thanks,

Jeff


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeff Flinn-2
On 5/9/2013 11:14 AM, Jeff Flinn wrote:
> I've ported a classic grammar to qi which was pretty straightforward. At
> run-time I'm hitting the BOOST_ASSERT(isascii_(ch)) in isspace when at
> the end of the last line in the input with ch ==-1.

Well I see the -1 at the end of my input data was generated by some code
that improperly streamed data from a file using:

        while (ifs.good()) os << (char) ifs.get();

fixing this get's rid of the assert and the parsing passes.

So my question is what if I ended up with a stream of data with a -1 in
it? Should qi really assert here? By the way this is with boost 1.52.0.

Thanks, Jeff









------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

sehe
On 05/09/2013 06:53 PM, Jeff Flinn wrote:
> So my question is what if I ended up with a stream of data with a -1 in
> it? Should qi really assert here? By the way this is with boost 1.52.0.
Yes. Unless you used a different encoding (so not `ascii::char_` and or
`ascii::space` etc.)

This is obviously to allow UNICODE operations.

May I suggest the `boost::spirit::istream_iterator` by the way, which
completely handles the stream extraction for you?

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeroen Habraken
In reply to this post by Jeff Flinn-2
On 9 May 2013 11:53, Jeff Flinn <[hidden email]> wrote:
On 5/9/2013 11:14 AM, Jeff Flinn wrote:
> I've ported a classic grammar to qi which was pretty straightforward. At
> run-time I'm hitting the BOOST_ASSERT(isascii_(ch)) in isspace when at
> the end of the last line in the input with ch ==-1.

Well I see the -1 at the end of my input data was generated by some code
that improperly streamed data from a file using:

        while (ifs.good()) os << (char) ifs.get();

fixing this get's rid of the assert and the parsing passes.

So my question is what if I ended up with a stream of data with a -1 in
it? Should qi really assert here? By the way this is with boost 1.52.0.

Thanks, Jeff

ASCII involves only the lower seven bits, if you want all characters to be accepted have a look at the "standard" namespace as described at http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.

Jeroen 

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeff Flinn-2
In reply to this post by sehe
On 5/9/2013 12:59 PM, Seth Heeren wrote:

> On 05/09/2013 06:53 PM, Jeff Flinn wrote:
>> So my question is what if I ended up with a stream of data with a -1 in
>> it? Should qi really assert here? By the way this is with boost 1.52.0.
> Yes. Unless you used a different encoding (so not `ascii::char_` and or
> `ascii::space` etc.)
>
> This is obviously to allow UNICODE operations.
>
> May I suggest the `boost::spirit::istream_iterator` by the way, which
> completely handles the stream extraction for you?

Yep, heading in that direction. I assume you are referring to
multi_pass_iterator that Hartmut recently fixed. Has this made it into
1.54.0 release yet? There were issues with it not clearing it's backing
store which caused performance and memory consumption problems with very
large input files.

Thanks, Jeff



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeff Flinn-2
In reply to this post by Jeroen Habraken
On 5/9/2013 1:12 PM, Jeroen Habraken wrote:

> On 9 May 2013 11:53, Jeff Flinn <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 5/9/2013 11:14 AM, Jeff Flinn wrote:
>      > I've ported a classic grammar to qi which was pretty
>     straightforward. At
>      > run-time I'm hitting the BOOST_ASSERT(isascii_(ch)) in isspace
>     when at
>      > the end of the last line in the input with ch ==-1.
>
>     Well I see the -1 at the end of my input data was generated by some code
>     that improperly streamed data from a file using:
>
>              while (ifs.good()) os << (char) ifs.get();
>
>     fixing this get's rid of the assert and the parsing passes.
>
>     So my question is what if I ended up with a stream of data with a -1 in
>     it? Should qi really assert here? By the way this is with boost 1.52.0.
>
>     Thanks, Jeff
>
>
> ASCII involves only the lower seven bits, if you want all characters to
> be accepted have a look at the "standard" namespace as described at
> http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.

So does that imply that one should not use the ASCII encoding namespace
in the wild, as corrupt input files will assert rather than fail to
parse or throw an exception that can be dealt with at runtime? I realize
the assert is not *generally* present in release builds. How do others
deal with this issue in a robust manner?

Thanks, Jeff



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Joel de Guzman
On 5/10/13 2:34 AM, Jeff Flinn wrote:

> On 5/9/2013 1:12 PM, Jeroen Habraken wrote:
>> On 9 May 2013 11:53, Jeff Flinn <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>      On 5/9/2013 11:14 AM, Jeff Flinn wrote:
>>       > I've ported a classic grammar to qi which was pretty
>>      straightforward. At
>>       > run-time I'm hitting the BOOST_ASSERT(isascii_(ch)) in isspace
>>      when at
>>       > the end of the last line in the input with ch ==-1.
>>
>>      Well I see the -1 at the end of my input data was generated by some code
>>      that improperly streamed data from a file using:
>>
>>               while (ifs.good()) os << (char) ifs.get();
>>
>>      fixing this get's rid of the assert and the parsing passes.
>>
>>      So my question is what if I ended up with a stream of data with a -1 in
>>      it? Should qi really assert here? By the way this is with boost 1.52.0.
>>
>>      Thanks, Jeff
>>
>>
>> ASCII involves only the lower seven bits, if you want all characters to
>> be accepted have a look at the "standard" namespace as described at
>> http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.
>
> So does that imply that one should not use the ASCII encoding namespace
> in the wild, as corrupt input files will assert rather than fail to
> parse or throw an exception that can be dealt with at runtime? I realize
> the assert is not *generally* present in release builds. How do others
> deal with this issue in a robust manner?

The thing is, ascii encoding uses a 128 element LUT. If you try to match
space from a char(-1), that will obviously go beyond the bounds of the
LUT and cause havoc. Hence, the assert. It is possible to make it fail,
but that would lead to performance degradation for this edge case. If
input goes beyond ASCII, I'd use a corresponding encoding such as iso8859_1.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeff Flinn-2
On 5/9/2013 7:12 PM, Joel de Guzman wrote:

> On 5/10/13 2:34 AM, Jeff Flinn wrote:
>> On 5/9/2013 1:12 PM, Jeroen Habraken wrote:
>>>
>>> ASCII involves only the lower seven bits, if you want all characters to
>>> be accepted have a look at the "standard" namespace as described at
>>> http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.
>>
>> So does that imply that one should not use the ASCII encoding namespace
>> in the wild, as corrupt input files will assert rather than fail to
>> parse or throw an exception that can be dealt with at runtime? I realize
>> the assert is not *generally* present in release builds. How do others
>> deal with this issue in a robust manner?
>
> The thing is, ascii encoding uses a 128 element LUT. If you try to match
> space from a char(-1), that will obviously go beyond the bounds of the
> LUT and cause havoc. Hence, the assert. It is possible to make it fail,
> but that would lead to performance degradation for this edge case. If
> input goes beyond ASCII, I'd use a corresponding encoding such as iso8859_1.

Thanks Joel. I'm just trying to sort through in my mind how to proceed.
The parsers are used in desktop GUI application, where in some cases the
user can select a file to open and parse from just about anywhere. If
they happen across a corrupt file, the app would assert and they could
possibly loose all they've been working on.

Options to avoid this situation include using iso8859_1(rather than
ASCII encoding), or create a checked_ascii_istream via boost::iostreams,
or maybe adapting boost::spirit::istream_iterator. What are the costs of
using iso8859_1? The latter two obviously incur the cost of comparison
for each char.

Thanks, Jeff

PS. By the way thanks for merging the fusion iomanipulator fix!



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Joel de Guzman
On 5/10/13 11:25 PM, Jeff Flinn wrote:

> Options to avoid this situation include using iso8859_1(rather than
> ASCII encoding), or create a checked_ascii_istream via boost::iostreams,
> or maybe adapting boost::spirit::istream_iterator. What are the costs of
> using iso8859_1? The latter two obviously incur the cost of comparison
> for each char.

The cost for using iso8859_1 is the additional 127 entries in the LUT.
Performance wise, it should be the same.

BTW> Are you in Aspen?

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file

Jeff Flinn-2
On 5/11/2013 10:43 PM, Joel de Guzman wrote:

> On 5/10/13 11:25 PM, Jeff Flinn wrote:
>
>> Options to avoid this situation include using iso8859_1(rather than
>> ASCII encoding), or create a checked_ascii_istream via boost::iostreams,
>> or maybe adapting boost::spirit::istream_iterator. What are the costs of
>> using iso8859_1? The latter two obviously incur the cost of comparison
>> for each char.
>
> The cost for using iso8859_1 is the additional 127 entries in the LUT.
> Performance wise, it should be the same.

Thanks for the info Joel. Oh by the way, discounting the corrupt input
issue, it took less than 15 minutes to covert the classic version to v2
including re-familiarizing myself with the docs.

> BTW> Are you in Aspen?

Unforturnately not. I was looking forward to your talk. Have a great
time in Aspen!

Jeff


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

ascii encoding assert on invalid input [ was Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file ]

Joel de Guzman
In reply to this post by Jeff Flinn-2
Hi Y'all,

The issue of the char parsers, ascii encoding firing an assert
when the char exceeds 127 (7bits) bites again. This time with the
json-parser from Ciere.

   https://github.com/cierelabs/json_spirit

The issue is that the input can be corrupt (or plain invalid) and
an assert is very unwelcome. I'm considering failing the parse instead.
Thoughts?

Below is a quote from the original post from Jeff Flinn related to
this issue:

On 5/10/13 11:25 PM, Jeff Flinn wrote:

> On 5/9/2013 7:12 PM, Joel de Guzman wrote:
>> On 5/10/13 2:34 AM, Jeff Flinn wrote:
>>> On 5/9/2013 1:12 PM, Jeroen Habraken wrote:
>>>>
>>>> ASCII involves only the lower seven bits, if you want all characters to
>>>> be accepted have a look at the "standard" namespace as described at
>>>> http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.
>>>
>>> So does that imply that one should not use the ASCII encoding namespace
>>> in the wild, as corrupt input files will assert rather than fail to
>>> parse or throw an exception that can be dealt with at runtime? I realize
>>> the assert is not *generally* present in release builds. How do others
>>> deal with this issue in a robust manner?
>>
>> The thing is, ascii encoding uses a 128 element LUT. If you try to match
>> space from a char(-1), that will obviously go beyond the bounds of the
>> LUT and cause havoc. Hence, the assert. It is possible to make it fail,
>> but that would lead to performance degradation for this edge case. If
>> input goes beyond ASCII, I'd use a corresponding encoding such as iso8859_1.
>
> Thanks Joel. I'm just trying to sort through in my mind how to proceed.
> The parsers are used in desktop GUI application, where in some cases the
> user can select a file to open and parse from just about anywhere. If
> they happen across a corrupt file, the app would assert and they could
> possibly loose all they've been working on.
>
> Options to avoid this situation include using iso8859_1(rather than
> ASCII encoding), or create a checked_ascii_istream via boost::iostreams,
> or maybe adapting boost::spirit::istream_iterator. What are the costs of
> using iso8859_1? The latter two obviously incur the cost of comparison
> for each char.
>
> Thanks, Jeff
>
> PS. By the way thanks for merging the fusion iomanipulator fix!
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and
> their applications. This 200-page book is written by three acclaimed
> leaders in the field. The early access version is available now.
> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
>


--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: ascii encoding assert on invalid input [ was Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file ]

Michael Powell-2


On November 1, 2015 8:40:56 PM EST, Joel de Guzman <[hidden email]> wrote:

>Hi Y'all,
>
>The issue of the char parsers, ascii encoding firing an assert
>when the char exceeds 127 (7bits) bites again. This time with the
>json-parser from Ciere.
>
>   https://github.com/cierelabs/json_spirit
>
>The issue is that the input can be corrupt (or plain invalid) and
>an assert is very unwelcome. I'm considering failing the parse instead.
>Thoughts?

Why not fail? Fail fast, fail early, and fail often.

>Below is a quote from the original post from Jeff Flinn related to
>this issue:
>
>On 5/10/13 11:25 PM, Jeff Flinn wrote:
>> On 5/9/2013 7:12 PM, Joel de Guzman wrote:
>>> On 5/10/13 2:34 AM, Jeff Flinn wrote:
>>>> On 5/9/2013 1:12 PM, Jeroen Habraken wrote:
>>>>>
>>>>> ASCII involves only the lower seven bits, if you want all
>characters to
>>>>> be accepted have a look at the "standard" namespace as described
>at
>>>>>
>http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.
>>>>
>>>> So does that imply that one should not use the ASCII encoding
>namespace
>>>> in the wild, as corrupt input files will assert rather than fail to
>>>> parse or throw an exception that can be dealt with at runtime? I
>realize
>>>> the assert is not *generally* present in release builds. How do
>others
>>>> deal with this issue in a robust manner?
>>>
>>> The thing is, ascii encoding uses a 128 element LUT. If you try to
>match
>>> space from a char(-1), that will obviously go beyond the bounds of
>the
>>> LUT and cause havoc. Hence, the assert. It is possible to make it
>fail,
>>> but that would lead to performance degradation for this edge case.
>If
>>> input goes beyond ASCII, I'd use a corresponding encoding such as
>iso8859_1.
>>
>> Thanks Joel. I'm just trying to sort through in my mind how to
>proceed.
>> The parsers are used in desktop GUI application, where in some cases
>the
>> user can select a file to open and parse from just about anywhere. If
>> they happen across a corrupt file, the app would assert and they
>could
>> possibly loose all they've been working on.
>>
>> Options to avoid this situation include using iso8859_1(rather than
>> ASCII encoding), or create a checked_ascii_istream via
>boost::iostreams,
>> or maybe adapting boost::spirit::istream_iterator. What are the costs
>of
>> using iso8859_1? The latter two obviously incur the cost of
>comparison
>> for each char.
>>
>> Thanks, Jeff
>>
>> PS. By the way thanks for merging the fusion iomanipulator fix!
>>
>>
>>
>>
>------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and
>> their applications. This 200-page book is written by three acclaimed
>> leaders in the field. The early access version is available now.
>> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
>>

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: ascii encoding assert on invalid input [ was Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file ]

Jeff Flinn-2
In reply to this post by Joel de Guzman
On 11/1/15 8:40 PM, Joel de Guzman wrote:

> Hi Y'all,
>
> The issue of the char parsers, ascii encoding firing an assert
> when the char exceeds 127 (7bits) bites again. This time with the
> json-parser from Ciere.
>
>     https://github.com/cierelabs/json_spirit
>
> The issue is that the input can be corrupt (or plain invalid) and
> an assert is very unwelcome. I'm considering failing the parse instead.
> Thoughts?

Seems prudent to me. ;-) I assume it's minimal run time cost?

Jeff

> Below is a quote from the original post from Jeff Flinn related to
> this issue:
>
> On 5/10/13 11:25 PM, Jeff Flinn wrote:
>> On 5/9/2013 7:12 PM, Joel de Guzman wrote:
>>> On 5/10/13 2:34 AM, Jeff Flinn wrote:
>>>> On 5/9/2013 1:12 PM, Jeroen Habraken wrote:
>>>>>
>>>>> ASCII involves only the lower seven bits, if you want all characters to
>>>>> be accepted have a look at the "standard" namespace as described at
>>>>> http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/spirit/qi/reference/basics.html.
>>>>
>>>> So does that imply that one should not use the ASCII encoding namespace
>>>> in the wild, as corrupt input files will assert rather than fail to
>>>> parse or throw an exception that can be dealt with at runtime? I realize
>>>> the assert is not *generally* present in release builds. How do others
>>>> deal with this issue in a robust manner?
>>>
>>> The thing is, ascii encoding uses a 128 element LUT. If you try to match
>>> space from a char(-1), that will obviously go beyond the bounds of the
>>> LUT and cause havoc. Hence, the assert. It is possible to make it fail,
>>> but that would lead to performance degradation for this edge case. If
>>> input goes beyond ASCII, I'd use a corresponding encoding such as iso8859_1.
>>
>> Thanks Joel. I'm just trying to sort through in my mind how to proceed.
>> The parsers are used in desktop GUI application, where in some cases the
>> user can select a file to open and parse from just about anywhere. If
>> they happen across a corrupt file, the app would assert and they could
>> possibly loose all they've been working on.
>>
>> Options to avoid this situation include using iso8859_1(rather than
>> ASCII encoding), or create a checked_ascii_istream via boost::iostreams,
>> or maybe adapting boost::spirit::istream_iterator. What are the costs of
>> using iso8859_1? The latter two obviously incur the cost of comparison
>> for each char.
>>
>> Thanks, Jeff
>>
>> PS. By the way thanks for merging the fusion iomanipulator fix!
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and
>> their applications. This 200-page book is written by three acclaimed
>> leaders in the field. The early access version is available now.
>> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
>>
>
>



------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: ascii encoding assert on invalid input [ was Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file ]

Joel de Guzman
On 11/10/15 11:06 AM, Jeff Flinn wrote:

> On 11/1/15 8:40 PM, Joel de Guzman wrote:
>> Hi Y'all,
>>
>> The issue of the char parsers, ascii encoding firing an assert
>> when the char exceeds 127 (7bits) bites again. This time with the
>> json-parser from Ciere.
>>
>>      https://github.com/cierelabs/json_spirit
>>
>> The issue is that the input can be corrupt (or plain invalid) and
>> an assert is very unwelcome. I'm considering failing the parse instead.
>> Thoughts?
>
> Seems prudent to me. ;-) I assume it's minimal run time cost?

Instead of a runtime cost, I thought it was best to simply provide a complete
256-byte LUT (instead of the 128). CPU cycles are more precious. Memory is
cheap.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: ascii encoding assert on invalid input [ was Re: porting classic to qi - hitting BOOST_ASSERT(isascii_(ch)); at end-of-file ]

Jeff Flinn-2
On 11/10/15 2:27 PM, Joel de Guzman wrote:

> On 11/10/15 11:06 AM, Jeff Flinn wrote:
>> On 11/1/15 8:40 PM, Joel de Guzman wrote:
>>> Hi Y'all,
>>>
>>> The issue of the char parsers, ascii encoding firing an assert
>>> when the char exceeds 127 (7bits) bites again. This time with the
>>> json-parser from Ciere.
>>>
>>>       https://github.com/cierelabs/json_spirit
>>>
>>> The issue is that the input can be corrupt (or plain invalid) and
>>> an assert is very unwelcome. I'm considering failing the parse instead.
>>> Thoughts?
>>
>> Seems prudent to me. ;-) I assume it's minimal run time cost?
>
> Instead of a runtime cost, I thought it was best to simply provide a complete
> 256-byte LUT (instead of the 128). CPU cycles are more precious. Memory is
> cheap.
>
> Regards,
>
Perfect!

Thanks, Jeff


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general