[Spirit] Qi lexeme only taking the first word

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
Hello,

I've got a couple of rules that are perplexing to me. First,

rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];

In and of itself, id is working fine. Then I've got a "full id":

rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);

Where:

struct full_id_t {
    std::string val;
};

full_id_t::val is quite intentional for reasons elsewhere in the grammar.

The perplexity comes in, it seems lexeme is only shaving off the first
word as the val.

For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.

Perhaps I should defer specifying the lexeme part of id until later?

Thoughts? Suggestions?

Thank you!

Best regards,

Michael Powell
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
On 7/11/2018 11:01, Michael Powell wrote:

> I've got a couple of rules that are perplexing to me. First,
>
> rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
>
> In and of itself, id is working fine. Then I've got a "full id":
>
> rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
>
> Where:
>
> struct full_id_t {
>      std::string val;
> };
>
> full_id_t::val is quite intentional for reasons elsewhere in the grammar.
>
> The perplexity comes in, it seems lexeme is only shaving off the first
> word as the val.
>
> For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.

Again, I don't really know anything about Spirit, but it's reasonable to
assume that "lexeme" will group its input sequence into a single token
output, which is the result of id as a single std::string.

Meanwhile in full_id you're specifying a sequence of input tokens, so it
will also output a sequence of tokens (which can presumably be captured
as a std::vector<std::string>, not simply a std::string).

Most likely (though again this is just a guess) given the input
"two.oranges.red.test" you should end up with std::vector<std::string> {
"two", "oranges", "red", "test" }.

This is probably what you want (as it will simplify later use of
subcomponents), especially if the language allows whitespace around the ".".

If you want to disallow whitespace around the "." and get it as a single
string token, then yes, you will probably have to make full_id call
lexeme.  I don't know whether that will require extracting the inner
part of id to a separate rule so that lexeme only ends up being called
once or if you can "nest" uses of lexeme.
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On Tue, Nov 6, 2018 at 5:01 PM Michael Powell <[hidden email]> wrote:

>
> Hello,
>
> I've got a couple of rules that are perplexing to me. First,
>
> rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
>
> In and of itself, id is working fine. Then I've got a "full id":
>
> rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
>
> Where:
>
> struct full_id_t {
>     std::string val;
> };
>
> full_id_t::val is quite intentional for reasons elsewhere in the grammar.
>
> The perplexity comes in, it seems lexeme is only shaving off the first
> word as the val.
>
> For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
>
> Perhaps I should defer specifying the lexeme part of id until later?

I elaborated a little on the "simple" full id sub-grammar, but I
cannot repro using the GCC compiler. I'm wondering if this has
anything to do with the VS2017 fpos issue?

http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd

Or there may be insufficient context in the web compiler to adequately demo.

> Thoughts? Suggestions?
>
> Thank you!
>
> Best regards,
>
> Michael Powell
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
On Tue, Nov 6, 2018 at 5:40 PM Michael Powell <[hidden email]> wrote:

>
> On Tue, Nov 6, 2018 at 5:01 PM Michael Powell <[hidden email]> wrote:
> >
> > Hello,
> >
> > I've got a couple of rules that are perplexing to me. First,
> >
> > rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
> >
> > In and of itself, id is working fine. Then I've got a "full id":
> >
> > rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
> >
> > Where:
> >
> > struct full_id_t {
> >     std::string val;
> > };
> >
> > full_id_t::val is quite intentional for reasons elsewhere in the grammar.
> >
> > The perplexity comes in, it seems lexeme is only shaving off the first
> > word as the val.
> >
> > For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
> >
> > Perhaps I should defer specifying the lexeme part of id until later?
>
> I elaborated a little on the "simple" full id sub-grammar, but I
> cannot repro using the GCC compiler. I'm wondering if this has
> anything to do with the VS2017 fpos issue?
>
> http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
>
> Or there may be insufficient context in the web compiler to adequately demo.

I got a repro:

http://coliru.stacked-crooked.com/a/069a44296240be7e

Although the reasons as to why I do not know.

It is a difference in attribute synthesis. When full_id synthesizes a
std::string(), the conversion to full_id_t() "just works" magically.
I'm guessing by happy accident based on the std::string val being the
only member (adaptation, etc).

But when I change the synthesis to be its "true" type, that is,
AST::full_id_t(), suddenly I see the same behavior.

Really and truly, I do not know why. Everything else being equal why
would one approach be any different than the other?

Anyone with some Spirit, Fusion, AST, insights?

Thanks!

For now, I'll run with it as has been exposed here, but it's a bit
troubling to me not knowing the difference.

> > Thoughts? Suggestions?
> >
> > Thank you!
> >
> > Best regards,
> >
> > Michael Powell
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,

When you have:

qi::rule<It, AST::full_id_t()> full_id;

the attribute is vector<string>

When it matches

id >> *(char_('.') >> id)

this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.

spirit appears to compare your target attribute with the synthesised attribute of the parser and for any (trailing?) members
of the synthesised attribute that do not match in your attribute, it marks them as unused_type and they are not assigned.

You can see overload of assign to is used in your example if you breakpoint it -> boost\spirit\home\qi\detail\assign_to.hpp line 399.

It appears in boost\spirit\home\qi\operator\sequence_base.hpp line 74, where the predicate
traits::attribute_not_unused<Context, Iterator> is passed to spirit::any_if (boost\spirit\home\support\algorithm\any_if.hpp line 186.)
it will basically discard attributes where the LHS sequence is not matched with the RHS.

You can see this in your example by adding an additional member to

    struct full_id_t {
        std::string val;
        std::vector<std::string> others;
    };
   
    BOOST_FUSION_ADAPT_STRUCT(AST::full_id_t, val, others)
   
Your missing bits will appear in this std::vector, as they are now not silently discarded.
http://coliru.stacked-crooked.com/a/51f16c6deff45309

I think what the problem fundamentally is the attribute propagation is different when you have a string to when you have a vector<string> as in your two examples.
the first kicks in whatever logic exists to flatten the LHS attribute into a string, the second takes the first element, assigns it
and marks the rest as unused.

One thing you can do is use qi::as<std::string>()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen
before it is assigned to your attribute.
http://coliru.stacked-crooked.com/a/6a060343a390f037

I've only had a quick look and this is pretty half hearted analysis. You'll really have to dig deep to find out exactly what is going on, but I suspect
this is somewhat along the right lines.

From: Boost-users <[hidden email]> on behalf of Michael Powell via Boost-users <[hidden email]>
Sent: 06 November 2018 23:03
To: [hidden email]
Cc: Michael Powell
Subject: Re: [Boost-users] [Spirit] Qi lexeme only taking the first word
 
On Tue, Nov 6, 2018 at 5:40 PM Michael Powell <[hidden email]> wrote:
>
> On Tue, Nov 6, 2018 at 5:01 PM Michael Powell <[hidden email]> wrote:
> >
> > Hello,
> >
> > I've got a couple of rules that are perplexing to me. First,
> >
> > rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
> >
> > In and of itself, id is working fine. Then I've got a "full id":
> >
> > rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
> >
> > Where:
> >
> > struct full_id_t {
> >     std::string val;
> > };
> >
> > full_id_t::val is quite intentional for reasons elsewhere in the grammar.
> >
> > The perplexity comes in, it seems lexeme is only shaving off the first
> > word as the val.
> >
> > For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
> >
> > Perhaps I should defer specifying the lexeme part of id until later?
>
> I elaborated a little on the "simple" full id sub-grammar, but I
> cannot repro using the GCC compiler. I'm wondering if this has
> anything to do with the VS2017 fpos issue?
>
> http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
>
> Or there may be insufficient context in the web compiler to adequately demo.

I got a repro:

http://coliru.stacked-crooked.com/a/069a44296240be7e

Although the reasons as to why I do not know.

It is a difference in attribute synthesis. When full_id synthesizes a
std::string(), the conversion to full_id_t() "just works" magically.
I'm guessing by happy accident based on the std::string val being the
only member (adaptation, etc).

But when I change the synthesis to be its "true" type, that is,
AST::full_id_t(), suddenly I see the same behavior.

Really and truly, I do not know why. Everything else being equal why
would one approach be any different than the other?

Anyone with some Spirit, Fusion, AST, insights?

Thanks!

For now, I'll run with it as has been exposed here, but it's a bit
troubling to me not knowing the difference.

> > Thoughts? Suggestions?
> >
> > Thank you!
> >
> > Best regards,
> >
> > Michael Powell
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
On Tue, Nov 6, 2018 at 8:12 PM rmawatson rmawatson
<[hidden email]> wrote:

>
> It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
>
> When you have:
>
> qi::rule<It, AST::full_id_t()> full_id;
>
> the attribute is vector<string>
>
> When it matches
>
> id >> *(char_('.') >> id)
>
> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.

Where are you getting that from? It makes no sense whatsoever given
the struct full_it_t { std::string val; }, which is similarly mapped,
and ruled, etc.

> spirit appears to compare your target attribute with the synthesised attribute of the parser and for any (trailing?) members
> of the synthesised attribute that do not match in your attribute, it marks them as unused_type and they are not assigned.

Would I need to do some grouping or something to persuade Spirit to
treat the struct as I've defined and adapted it?

> You can see overload of assign to is used in your example if you breakpoint it -> boost\spirit\home\qi\detail\assign_to.hpp line 399.
>
> It appears in boost\spirit\home\qi\operator\sequence_base.hpp line 74, where the predicate
> traits::attribute_not_unused<Context, Iterator> is passed to spirit::any_if (boost\spirit\home\support\algorithm\any_if.hpp line 186.)
> it will basically discard attributes where the LHS sequence is not matched with the RHS.
>
> You can see this in your example by adding an additional member to
>
>     struct full_id_t {
>         std::string val;
>         std::vector<std::string> others;
>     };
>
>     BOOST_FUSION_ADAPT_STRUCT(AST::full_id_t, val, others)
>
> Your missing bits will appear in this std::vector, as they are now not silently discarded.
> http://coliru.stacked-crooked.com/a/51f16c6deff45309
>
> I think what the problem fundamentally is the attribute propagation is different when you have a string to when you have a vector<string> as in your two examples.
> the first kicks in whatever logic exists to flatten the LHS attribute into a string, the second takes the first element, assigns it
> and marks the rest as unused.
>
> One thing you can do is use qi::as<std::string>()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen
> before it is assigned to your attribute.
> http://coliru.stacked-crooked.com/a/6a060343a390f037
>
> I've only had a quick look and this is pretty half hearted analysis. You'll really have to dig deep to find out exactly what is going on, but I suspect
> this is somewhat along the right lines.
> ________________________________
> From: Boost-users <[hidden email]> on behalf of Michael Powell via Boost-users <[hidden email]>
> Sent: 06 November 2018 23:03
> To: [hidden email]
> Cc: Michael Powell
> Subject: Re: [Boost-users] [Spirit] Qi lexeme only taking the first word
>
> On Tue, Nov 6, 2018 at 5:40 PM Michael Powell <[hidden email]> wrote:
> >
> > On Tue, Nov 6, 2018 at 5:01 PM Michael Powell <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > I've got a couple of rules that are perplexing to me. First,
> > >
> > > rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
> > >
> > > In and of itself, id is working fine. Then I've got a "full id":
> > >
> > > rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
> > >
> > > Where:
> > >
> > > struct full_id_t {
> > >     std::string val;
> > > };
> > >
> > > full_id_t::val is quite intentional for reasons elsewhere in the grammar.
> > >
> > > The perplexity comes in, it seems lexeme is only shaving off the first
> > > word as the val.
> > >
> > > For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
> > >
> > > Perhaps I should defer specifying the lexeme part of id until later?
> >
> > I elaborated a little on the "simple" full id sub-grammar, but I
> > cannot repro using the GCC compiler. I'm wondering if this has
> > anything to do with the VS2017 fpos issue?
> >
> > http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
> >
> > Or there may be insufficient context in the web compiler to adequately demo.
>
> I got a repro:
>
> http://coliru.stacked-crooked.com/a/069a44296240be7e
>
> Although the reasons as to why I do not know.
>
> It is a difference in attribute synthesis. When full_id synthesizes a
> std::string(), the conversion to full_id_t() "just works" magically.
> I'm guessing by happy accident based on the std::string val being the
> only member (adaptation, etc).
>
> But when I change the synthesis to be its "true" type, that is,
> AST::full_id_t(), suddenly I see the same behavior.
>
> Really and truly, I do not know why. Everything else being equal why
> would one approach be any different than the other?
>
> Anyone with some Spirit, Fusion, AST, insights?
>
> Thanks!
>
> For now, I'll run with it as has been exposed here, but it's a bit
> troubling to me not knowing the difference.
>
> > > Thoughts? Suggestions?
> > >
> > > Thank you!
> > >
> > > Best regards,
> > >
> > > Michael Powell
> _______________________________________________
> Boost-users mailing list
> [hidden email]
> https://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
On 7/11/2018 15:08, Michael Powell wrote:
>> When it matches
>>
>> id >> *(char_('.') >> id)
>>
>> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.
>
> Where are you getting that from? It makes no sense whatsoever given
> the struct full_it_t { std::string val; }, which is similarly mapped,
> and ruled, etc.

This might be wrong, but it's how I read the docs:

The output of parsing is a Fusion sequence of the attributes that were
parsed.

So the output of

   id >> *(char_('.') >> id)

is something like (but not exactly)

   tuple<string>
   tuple<string, char, string>
   tuple<string, char, string, char, string>
   etc

string because that's the output attribute declared for id.
char because you've used char_ instead of using '.' by itself (otherwise
it would just disappear).
And the latter two can be repeated zero or more times because you've used *.

When you assign this to a rule with %=, it tries to best-fit this
against the rule's declared output attribute.

full_id_t contains a single string field, so the Fusion adaptation makes
it equivalent to tuple<string>, and apparently this results in any
additional values being discarded, not in concatenating as you expect.

You can probably use an explicit semantic action to build a single
string instead of using %=.

Or you can make full_id_t contain vector<string> as rmawatson and I
previously suggested, which should give you all the values.

Another possibility, which I can't test because coliru appears to be
grumpy at present, is to try using:

   full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
On Tue, Nov 6, 2018 at 10:28 PM Gavin Lambert via Boost-users
<[hidden email]> wrote:

>
> On 7/11/2018 15:08, Michael Powell wrote:
> >> When it matches
> >>
> >> id >> *(char_('.') >> id)
> >>
> >> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.
> >
> > Where are you getting that from? It makes no sense whatsoever given
> > the struct full_it_t { std::string val; }, which is similarly mapped,
> > and ruled, etc.
>
> This might be wrong, but it's how I read the docs:
>
> The output of parsing is a Fusion sequence of the attributes that were
> parsed.
>
> So the output of
>
>    id >> *(char_('.') >> id)
>
> is something like (but not exactly)
>
>    tuple<string>
>    tuple<string, char, string>
>    tuple<string, char, string, char, string>
>    etc
>
> string because that's the output attribute declared for id.
> char because you've used char_ instead of using '.' by itself (otherwise
> it would just disappear).
> And the latter two can be repeated zero or more times because you've used *.
>
> When you assign this to a rule with %=, it tries to best-fit this
> against the rule's declared output attribute.
>
> full_id_t contains a single string field, so the Fusion adaptation makes
> it equivalent to tuple<string>, and apparently this results in any
> additional values being discarded, not in concatenating as you expect.
>
> You can probably use an explicit semantic action to build a single
> string instead of using %=.
>
> Or you can make full_id_t contain vector<string> as rmawatson and I
> previously suggested, which should give you all the values.
>
> Another possibility, which I can't test because coliru appears to be
> grumpy at present, is to try using:
>
>    full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];

This approach works for me. And remains true to the AST. +1 Thanks!

> _______________________________________________
> Boost-users mailing list
> [hidden email]
> https://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On 7/11/2018 16:28, I wrote:
> Another possibility, which I can't test because coliru appears to be
> grumpy at present, is to try using:
>
>    full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];

Actually, since you're consuming a consecutive sequence of input
characters without skipping any whitespace, you could probably use this
instead, which might be faster (though that's just a guess; measure it!):

     full_id %= as_string[raw[id >> *('.' >> id)]];

(I was half expecting as_string to not be needed here, but apparently it
still is.)
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On 11/6/18 4:40 PM, Michael Powell via Boost-users wrote:

> On Tue, Nov 6, 2018 at 5:01 PM Michael Powell <[hidden email]> wrote:
>>
>> Hello,
>>
>> I've got a couple of rules that are perplexing to me. First,
>>
>> rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
>>
>> In and of itself, id is working fine. Then I've got a "full id":
>>
>> rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
>>
>> Where:
>>
>> struct full_id_t {
>>      std::string val;
>> };
>>
>> full_id_t::val is quite intentional for reasons elsewhere in the grammar.
>>
>> The perplexity comes in, it seems lexeme is only shaving off the first
>> word as the val.
>>
>> For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
>>
>> Perhaps I should defer specifying the lexeme part of id until later?
[snip]
The following simplification:

https://coliru.stacked-crooked.com/a/1adacde1a472d7a7

shows the full_id_t has the full attributes; however,
it does *not* join  them with the '.' char.  Instead,
it's a vector<std::string>.

Unfortunately, I don't know how to automatically combine
into a single string, but maybe this simplification will
give you a starting point to figure that out.

-regards,
Larry

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On 11/6/18 7:12 PM, rmawatson rmawatson via Boost-users wrote:

> It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
>
> When you have:
>
> qi::rule<It, AST::full_id_t()> full_id;
>
> the attribute is vector<string>
>
> When it matches
>
> id >> *(char_('.') >> id)
>
> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.
[snip]

> One thing you can do is use qi::as<std::string>()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen
> before it is assigned to your attribute.
[snip]

rmawatson's as<std::string> suggestion works:

https://coliru.stacked-crooked.com/a/a2c9435ee9e88bad

Yeah rmawatson!


_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.

Where are you getting that from? It makes no sense whatsoever given
the struct full_it_t { std::string val; }, which is similarly mapped,
and ruled, 
I've just had a look and the synthesized attribute is actually
boost::fusion::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::vector<boost::fusion::vector<char,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<boost::fusion::vector<char,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > > >

Cleaned up that is,

boost::fusion::vector<std::string,std::vector<boost::fusion::vector<char,std::string>>>

So almost exactly what I said. You can get this using spirit::traits::attribute_of.

It makes perfect sense. This is the synthesized attribute of the RHS parser, not the attribute you have passed it. Your struct full_it_t { std::string val; } appears to spirit as boost::fusion::vector<std::string>
this is precisely what the BOOST_FUSION_ADAPT_STRUCT is for. The RHS parsers is the result of the various parsers you use. In your case...

Sequence Parser (a >> b)

Expression --> Attribute
a: A, b: B --> (a >> b): tuple<A, B>

Kleene Parser (*a)
Expression --> Attribute
a: A --> *a: vector<A>

Character Parser (char_, lit)
Expression --> Attribute
ns::char_ --> The character type of the Character Encoding Namespace, ns.

With your expression, where id's attribute is string, you have

id >> *(char_('.') >> id)

gives

tuple<string,vector<tuple<char,string>>>

This is then assigned to the target attribute of the LHS rule. which in your setup is, boost::fusion::vector<std::string>



From: Boost-users <[hidden email]> on behalf of Michael Powell via Boost-users <[hidden email]>
Sent: 07 November 2018 02:08
To: [hidden email]
Cc: Michael Powell
Subject: Re: [Boost-users] [Spirit] Qi lexeme only taking the first word
 
On Tue, Nov 6, 2018 at 8:12 PM rmawatson rmawatson
<[hidden email]> wrote:
>
> It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
>
> When you have:
>
> qi::rule<It, AST::full_id_t()> full_id;
>
> the attribute is vector<string>
>
> When it matches
>
> id >> *(char_('.') >> id)
>
> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.

Where are you getting that from? It makes no sense whatsoever given
the struct full_it_t { std::string val; }, which is similarly mapped,
and ruled, etc.

> spirit appears to compare your target attribute with the synthesised attribute of the parser and for any (trailing?) members
> of the synthesised attribute that do not match in your attribute, it marks them as unused_type and they are not assigned.

Would I need to do some grouping or something to persuade Spirit to
treat the struct as I've defined and adapted it?

> You can see overload of assign to is used in your example if you breakpoint it -> boost\spirit\home\qi\detail\assign_to.hpp line 399.
>
> It appears in boost\spirit\home\qi\operator\sequence_base.hpp line 74, where the predicate
> traits::attribute_not_unused<Context, Iterator> is passed to spirit::any_if (boost\spirit\home\support\algorithm\any_if.hpp line 186.)
> it will basically discard attributes where the LHS sequence is not matched with the RHS.
>
> You can see this in your example by adding an additional member to
>
>     struct full_id_t {
>         std::string val;
>         std::vector<std::string> others;
>     };
>
>     BOOST_FUSION_ADAPT_STRUCT(AST::full_id_t, val, others)
>
> Your missing bits will appear in this std::vector, as they are now not silently discarded.
> http://coliru.stacked-crooked.com/a/51f16c6deff45309
>
> I think what the problem fundamentally is the attribute propagation is different when you have a string to when you have a vector<string> as in your two examples.
> the first kicks in whatever logic exists to flatten the LHS attribute into a string, the second takes the first element, assigns it
> and marks the rest as unused.
>
> One thing you can do is use qi::as<std::string>()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen
> before it is assigned to your attribute.
> http://coliru.stacked-crooked.com/a/6a060343a390f037
>
> I've only had a quick look and this is pretty half hearted analysis. You'll really have to dig deep to find out exactly what is going on, but I suspect
> this is somewhat along the right lines.
> ________________________________
> From: Boost-users <[hidden email]> on behalf of Michael Powell via Boost-users <[hidden email]>
> Sent: 06 November 2018 23:03
> To: [hidden email]
> Cc: Michael Powell
> Subject: Re: [Boost-users] [Spirit] Qi lexeme only taking the first word
>
> On Tue, Nov 6, 2018 at 5:40 PM Michael Powell <[hidden email]> wrote:
> >
> > On Tue, Nov 6, 2018 at 5:01 PM Michael Powell <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > I've got a couple of rules that are perplexing to me. First,
> > >
> > > rule<It, std::string(), St> id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
> > >
> > > In and of itself, id is working fine. Then I've got a "full id":
> > >
> > > rule<It, full_id_t(), St> full_id %= id >> *(char_('.') >> id);
> > >
> > > Where:
> > >
> > > struct full_id_t {
> > >     std::string val;
> > > };
> > >
> > > full_id_t::val is quite intentional for reasons elsewhere in the grammar.
> > >
> > > The perplexity comes in, it seems lexeme is only shaving off the first
> > > word as the val.
> > >
> > > For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
> > >
> > > Perhaps I should defer specifying the lexeme part of id until later?
> >
> > I elaborated a little on the "simple" full id sub-grammar, but I
> > cannot repro using the GCC compiler. I'm wondering if this has
> > anything to do with the VS2017 fpos issue?
> >
> > http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
> >
> > Or there may be insufficient context in the web compiler to adequately demo.
>
> I got a repro:
>
> http://coliru.stacked-crooked.com/a/069a44296240be7e
>
> Although the reasons as to why I do not know.
>
> It is a difference in attribute synthesis. When full_id synthesizes a
> std::string(), the conversion to full_id_t() "just works" magically.
> I'm guessing by happy accident based on the std::string val being the
> only member (adaptation, etc).
>
> But when I change the synthesis to be its "true" type, that is,
> AST::full_id_t(), suddenly I see the same behavior.
>
> Really and truly, I do not know why. Everything else being equal why
> would one approach be any different than the other?
>
> Anyone with some Spirit, Fusion, AST, insights?
>
> Thanks!
>
> For now, I'll run with it as has been exposed here, but it's a bit
> troubling to me not knowing the difference.
>
> > > Thoughts? Suggestions?
> > >
> > > Thank you!
> > >
> > > Best regards,
> > >
> > > Michael Powell
> _______________________________________________
> Boost-users mailing list
> [hidden email]
> https://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [Spirit] Qi lexeme only taking the first word

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On Tue, Nov 6, 2018 at 11:46 PM Michael Powell <[hidden email]> wrote:

>
> On Tue, Nov 6, 2018 at 10:28 PM Gavin Lambert via Boost-users
> <[hidden email]> wrote:
> >
> > On 7/11/2018 15:08, Michael Powell wrote:
> > >> When it matches
> > >>
> > >> id >> *(char_('.') >> id)
> > >>
> > >> this has an attribute of vector<string,vector<tuple<char,std::string>>> or something similar.
> > >
> > > Where are you getting that from? It makes no sense whatsoever given
> > > the struct full_it_t { std::string val; }, which is similarly mapped,
> > > and ruled, etc.
> >
> > This might be wrong, but it's how I read the docs:
> >
> > The output of parsing is a Fusion sequence of the attributes that were
> > parsed.
> >
> > So the output of
> >
> >    id >> *(char_('.') >> id)
> >
> > is something like (but not exactly)
> >
> >    tuple<string>
> >    tuple<string, char, string>
> >    tuple<string, char, string, char, string>
> >    etc
> >
> > string because that's the output attribute declared for id.
> > char because you've used char_ instead of using '.' by itself (otherwise
> > it would just disappear).
> > And the latter two can be repeated zero or more times because you've used *.
> >
> > When you assign this to a rule with %=, it tries to best-fit this
> > against the rule's declared output attribute.
> >
> > full_id_t contains a single string field, so the Fusion adaptation makes
> > it equivalent to tuple<string>, and apparently this results in any
> > additional values being discarded, not in concatenating as you expect.
> >
> > You can probably use an explicit semantic action to build a single
> > string instead of using %=.
> >
> > Or you can make full_id_t contain vector<string> as rmawatson and I
> > previously suggested, which should give you all the values.
> >
> > Another possibility, which I can't test because coliru appears to be
> > grumpy at present, is to try using:
> >
> >    full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];
>
> This approach works for me. And remains true to the AST. +1 Thanks!

Boy, wow... I'll qualify that with this: in "this" case I was able to
persuade Spirit/Fusion to produce what I wanted.

In other cases, not so much. It really, I mean **REALLY**, wants to
produce that std::vector<...>, doesn't it?

It will take a bit of digesting to adjust the AST, etc, to that, but
it's good (no, GREAT) to know about.

> > _______________________________________________
> > Boost-users mailing list
> > [hidden email]
> > https://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users