Unintuitive behavior of skipping in qi

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Unintuitive behavior of skipping in qi

Sanchay Harneja
Why does +qi::char_("a-z") to fully match "ab cd" with qi::blank skipper?
Why do the tokens "ab" and "cd" get merged? I just spent 2h+ chasing this.

program:
int main() {
  string s = "ab cd";
  auto f = s.begin();
  auto l = s.end();
  string o;
  bool rc = qi::phrase_parse(f, l, +qi::char_("a-z"), qi::blank, o);

  cout << (f == l) << " " << rc << " " <<
      ((rc && (f==l)) ? "success" : "fail") << endl;
  cout << o << endl;
}

output:
1 1 success
abcd



------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Unintuitive behavior of skipping in qi

chris beck
Can you clarify how this is unintuitive?

What do you intuitively expect that the action of the skipper should be in the code that you posted? That it only trims spaces from the extremes of the input string, and doesn't do anything else?

It could work that way but I think it would be a lot less useful.

In qi, skippers apply to rules, and get activated once before each expression in the rule.
You can have different skippers for different rules. In your example though, there is only one rule. I think it's pretty reasonable to expect that if you apply `qi::blank` to the rule `*(qi::char_("a-z"))` that the results will be the same as `*(qi::char_("a-z"))` acting on a string where all the whitespace was deleted beforehand.

In pure qi there is not a notion of tokens. If you are trying to write a (typical) tokenizer using qi, you probably don't want to use `qi::blank` as a skipper for that. Or you could, but then you should use the "lexeme" directive for the part of the grammar which matches tokens, as that will prevent this "token merging" issue you raise.

C.f. http://www.boost.org/doc/libs/1_59_0/libs/spirit/doc/html/spirit/qi/reference/directive/lexeme.html

On Thu, Oct 29, 2015 at 5:20 AM, Sanchay Harneja <[hidden email]> wrote:
Why does +qi::char_("a-z") to fully match "ab cd" with qi::blank skipper?
Why do the tokens "ab" and "cd" get merged? I just spent 2h+ chasing this.

program:
int main() {
  string s = "ab cd";
  auto f = s.begin();
  auto l = s.end();
  string o;
  bool rc = qi::phrase_parse(f, l, +qi::char_("a-z"), qi::blank, o);

  cout << (f == l) << " " << rc << " " <<
      ((rc && (f==l)) ? "success" : "fail") << endl;
  cout << o << endl;
}

output:
1 1 success
abcd



------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general


------------------------------------------------------------------------------

_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Unintuitive behavior of skipping in qi

Mario Lang
In reply to this post by Sanchay Harneja
Sanchay Harneja <[hidden email]> writes:

> Why does +qi::char_("a-z") to fully match "ab cd" with qi::blank skipper?

Because you are using phrase_parse.  Most parsers in Spirit start by
invoking the skipper.  The char parser is no different.
Since you are wrapping the char parser in +, it will be called
repeatedly, and for each invokation, it first calls the skipper:

    template <typename Derived, typename Char, typename Attr = Char>
    struct char_parser : primitive_parser<Derived>
    {
        ...

        template <typename Iterator, typename Context, typename Skipper, typename Attribute>
        bool parse(Iterator& first, Iterator const& last
                  , Context& context, Skipper const& skipper, Attribute& attr_) const
        {
            qi::skip_over(first, last, skipper);

            if (first != last && this->derived().test(*first, context))
                ...
        }
    }

> Why do the tokens "ab" and "cd" get merged?

They are not treated like separate tokens at all.  It just happens that
the space is being consumed by the char parser calling to your skipper
before it actually sees 'c'.

> I just spent 2h+ chasing this.

I consider skippers a different approach to lexing.
However, I don't have any active experience with them, since the project
I am working on is whitespace sensitive, therefore I opted to always
make spacing explicit in my grammar and not use phrase_parse.

--
CYa,
  ⡍⠁⠗⠊⠕

------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Unintuitive behavior of skipping in qi

Sanchay Harneja
Yes I understood the way it works, after being bitten by it.
My post was just a bit of a rant :-)  after spending the whole day reading
spirit documentation and still not getting my simple parser to work.



------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Unintuitive behavior of skipping in qi

Joel de Guzman
On 10/30/15 2:53 AM, Sanchay Harneja wrote:
> Yes I understood the way it works, after being bitten by it.
> My post was just a bit of a rant :-)  after spending the whole day reading
> spirit documentation and still not getting my simple parser to work.

In such cases, I rely on #define BOOST_SPIRIT_QI_DEBUG to help me
visualize what's going on.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general