X3: no case directive implementation difficulties

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

X3: no case directive implementation difficulties

teajay-2
Hello,

I saw the pull request from xylosper, and the comment from Joel
regarding the no_case directive. I agrea with Joel that it's not the
optimal way to implement no_case so I thought I'll take a shot at it.

I've come pretty far and am at the point were I face a pretty nifty
architectural issue. Here's the situation.

In X3, the directives such as no_skip, propagate the necessary
information down to the subject parsers through the parser context. This
should also be the prefered way to pass the no_case information down to
the subject parsers, in order to make subjects change their behavior, as
this also works when used with rules.

When using contexts, we can have:

rule1 = char('a');
rule2 = no_case[rule1];

When parsing rule2, rule1 should become case insensitive.

This leads to changing the parsing function of char depending on the
context passed to it. In this modified parse function, to avoid
converting the expected literal to the other case type on every parse
call (which would be equivalent to the already proposed implementation),
it needs to be computed once and stored.

I thought, that's easy, just use a static in the parse function. Except
that doesn't work, because there is only one single instantiation of the
parse function for a given encoding inside a rule if no context changes
are present, leading to wrong results as soon as a second parser gets
called. In the second call, the static still holds the value from the
preceding parser as the parse function's signature is identical.

Example:

rule1 = no_case[char('a') >> char('C')];

char('a') contains 'a' and 'A' as valid characters.
char('C') contains 'C' and 'A' as valid characters.

One way to get around this would be to reintroduce a context switch on
every parse rule level to make a differentiation between char('a') and
char('C') parse function calls, but that would lead to the template
explosion problem.

If there is a way to get this variation to work, I'm all ears. I haven't
come up with anything yet.

There's another solution I though of which, would be to propagate the
no_case information on construction of the subject parsers.
This would mean passing some context information on construction, and
extend the as_parser templates with something like this:

template <typename Derived, typename Context>
struct as_parser
{...};

This would make it possible to switch the char implementation on the
class level and not on the function level, making the definition scope
of the other literal value clear, and the parsing efficient.

There is however one drawback of this approach, which is that the
no_case directive wouldn't work across rule barriers. Hence the first
example I gave, wouldn't work.

To sum it up:
1. A solution using Contexts can't be runtime optimal
2. A solution using the context propagation on construction doesn't work
across rules.

I hope other wise people can share some wisdom on this !

Regards,

Thomas Bernard


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: X3: no case directive implementation difficulties

Joel de Guzman
Pardon the delay, I've been swamped.

On 9/28/14, 6:35 AM, Thomas Bernard wrote:

> Hello,
>
> I saw the pull request from xylosper, and the comment from Joel
> regarding the no_case directive. I agrea with Joel that it's not the
> optimal way to implement no_case so I thought I'll take a shot at it.
>
> I've come pretty far and am at the point were I face a pretty nifty
> architectural issue. Here's the situation.
>
> In X3, the directives such as no_skip, propagate the necessary
> information down to the subject parsers through the parser context. This
> should also be the prefered way to pass the no_case information down to
> the subject parsers, in order to make subjects change their behavior, as
> this also works when used with rules.

Yes, that's the basic strategy.

> When using contexts, we can have:
>
> rule1 = char('a');
> rule2 = no_case[rule1];
>
> When parsing rule2, rule1 should become case insensitive.
>
> This leads to changing the parsing function of char depending on the
> context passed to it. In this modified parse function, to avoid
> converting the expected literal to the other case type on every parse
> call (which would be equivalent to the already proposed implementation),

Not quite. If you do this only for affected primitive parsers (e.g.
literals, char parsers, strings, symbols, etc., then the effect will
be minimized. It won't be as fast as V2 (which stores the no-case
versions in the parser at construction time), but I'm not sure if
I want to repeat the complexity for this special case.

> it needs to be computed once and stored.

Indeed.

> I thought, that's easy, just use a static in the parse function. Except
> that doesn't work, because there is only one single instantiation of the
> parse function for a given encoding inside a rule if no context changes
> are present, leading to wrong results as soon as a second parser gets
> called. In the second call, the static still holds the value from the
> preceding parser as the parse function's signature is identical.

No, static is bad. An heap allocated member var is possible, but
I'm not sure if it's worth it.

[snip]

> There's another solution I though of which, would be to propagate the
> no_case information on construction of the subject parsers.
> This would mean passing some context information on construction, and
> extend the as_parser templates with something like this:
>
> template <typename Derived, typename Context>
> struct as_parser
> {...};
>
> This would make it possible to switch the char implementation on the
> class level and not on the function level, making the definition scope
> of the other literal value clear, and the parsing efficient.
>
> There is however one drawback of this approach, which is that the
> no_case directive wouldn't work across rule barriers. Hence the first
> example I gave, wouldn't work.
>
> To sum it up:
> 1. A solution using Contexts can't be runtime optimal
> 2. A solution using the context propagation on construction doesn't work
> across rules.

This is how Qi does it. It works, but is a complex mechanism. It's not
even 100% effective though. There is at least one case (the symbols)
where you can't store all permutations of case and no case versions.

I think, it's best to keep it simpler than Qi and simply do the filtering
on the parse function of only the relevant parsers. A nocase switch,
passed as context, will signal all parsers to do the filtering. This
is similar to how 'classic' spirit does it.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: X3: no case directive implementation difficulties

teajay-2

Le 04/10/2014 03:31, Joel de Guzman a écrit :

> Pardon the delay, I've been swamped.
>
> On 9/28/14, 6:35 AM, Thomas Bernard wrote:
>> Hello,
>>
>> I saw the pull request from xylosper, and the comment from Joel
>> regarding the no_case directive. I agrea with Joel that it's not the
>> optimal way to implement no_case so I thought I'll take a shot at it.
>>
>> I've come pretty far and am at the point were I face a pretty nifty
>> architectural issue. Here's the situation.
>>
>> In X3, the directives such as no_skip, propagate the necessary
>> information down to the subject parsers through the parser context. This
>> should also be the prefered way to pass the no_case information down to
>> the subject parsers, in order to make subjects change their behavior, as
>> this also works when used with rules.
> Yes, that's the basic strategy.
>
>> When using contexts, we can have:
>>
>> rule1 = char('a');
>> rule2 = no_case[rule1];
>>
>> When parsing rule2, rule1 should become case insensitive.
>>
>> This leads to changing the parsing function of char depending on the
>> context passed to it. In this modified parse function, to avoid
>> converting the expected literal to the other case type on every parse
>> call (which would be equivalent to the already proposed implementation),
> Not quite. If you do this only for affected primitive parsers (e.g.
> literals, char parsers, strings, symbols, etc., then the effect will
> be minimized. It won't be as fast as V2 (which stores the no-case
> versions in the parser at construction time), but I'm not sure if
> I want to repeat the complexity for this special case.
>
>> it needs to be computed once and stored.
> Indeed.
>
>> I thought, that's easy, just use a static in the parse function. Except
>> that doesn't work, because there is only one single instantiation of the
>> parse function for a given encoding inside a rule if no context changes
>> are present, leading to wrong results as soon as a second parser gets
>> called. In the second call, the static still holds the value from the
>> preceding parser as the parse function's signature is identical.
> No, static is bad. An heap allocated member var is possible, but
> I'm not sure if it's worth it.
>
> [snip]
>
>> There's another solution I though of which, would be to propagate the
>> no_case information on construction of the subject parsers.
>> This would mean passing some context information on construction, and
>> extend the as_parser templates with something like this:
>>
>> template <typename Derived, typename Context>
>> struct as_parser
>> {...};
>>
>> This would make it possible to switch the char implementation on the
>> class level and not on the function level, making the definition scope
>> of the other literal value clear, and the parsing efficient.
>>
>> There is however one drawback of this approach, which is that the
>> no_case directive wouldn't work across rule barriers. Hence the first
>> example I gave, wouldn't work.
>>
>> To sum it up:
>> 1. A solution using Contexts can't be runtime optimal
>> 2. A solution using the context propagation on construction doesn't work
>> across rules.
> This is how Qi does it. It works, but is a complex mechanism. It's not
> even 100% effective though. There is at least one case (the symbols)
> where you can't store all permutations of case and no case versions.
>
> I think, it's best to keep it simpler than Qi and simply do the filtering
> on the parse function of only the relevant parsers. A nocase switch,
> passed as context, will signal all parsers to do the filtering. This
> is similar to how 'classic' spirit does it.
>
> Regards,
I know which way to go. I'll try to finish this as soon as I can.

Regards,

Thomas

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general