Boost Spirit X3 and lexers

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Boost Spirit X3 and lexers

Peter Huene
Hello all,

I am considering migrating a parser that makes use of Spirit.Lex and Spirit.Qi to X3.

We're currently using a static lexer that has to handle what our language calls "heredocs", which are string literals that appear on lines following the current one, up to some "end of string" identifier.

Here's an example in our language:

```
function(@(FIRST), @(SECOND))
  This is
  all part of the first string
  |- FIRST
  This is
  all part of the second string
  |- SECOND
# Here is where lexing continues after newline following close paren token
```

This input should tokenize as: identifier, open paren, string literal, comma, string literal, close paren.

To accomplish this, a custom input iterator to the lexer is used that is capable of remembering where to "skip to" when a new line character is encountered.  A lexer semantic action is used to tokenize the heredoc into a single token and this works great for our purposes.

However, from everything I've seen of X3, I'm not sure how to integrate it with Spirit.Lex.  Can we continue to use the `token` and `raw_token` terminals with an X3 rule?  Are there replacements for them instead?

Given the apparent speed of X3 in parsing, is using a separate lexer even a worthwhile endeavor?  Should I instead use a parser semantic action to accomplish the same task as what is currently being done in the lexer?

Apologies in advance to the onslaught of questions and thanks in advance for any insights.  I'd love to migrate our parser to X3 as it seems to provide many benefits over Qi.

Cheers,
Peter

------------------------------------------------------------------------------

_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Boost Spirit X3 and lexers

sehe
On 10/19/2015 07:55 PM, Peter Huene wrote:
>
> Given the apparent speed of X3 in parsing, is using a separate lexer
> even a worthwhile endeavor?  Should I instead use a parser semantic
> action to accomplish the same task as what is currently being done in
> the lexer?

I'd vouch for this in the absense of profiling data that tells you you
can't do this

------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Boost Spirit X3 and lexers

Joel de Guzman
In reply to this post by Peter Huene
Hi Peter,

Pardon the delay. I almost missed this post.

On 10/20/15 1:55 AM, Peter Huene wrote:

> Hello all,
>
> I am considering migrating a parser that makes use of Spirit.Lex and Spirit.Qi to X3.
>
> We're currently using a static lexer that has to handle what our language calls
> "heredocs", which are string literals that appear on lines following the current one, up
> to some "end of string" identifier.
>
> Here's an example in our language:
>
> ```
> function(@(FIRST), @(SECOND))
>    This is
>    all part of the first string
>    |- FIRST
>    This is
>    all part of the second string
>    |- SECOND
> # Here is where lexing continues after newline following close paren token
> ```
>
> This input should tokenize as: identifier, open paren, string literal, comma, string
> literal, close paren.
>
> To accomplish this, a custom input iterator to the lexer is used that is capable of
> remembering where to "skip to" when a new line character is encountered.  A lexer semantic
> action is used to tokenize the heredoc into a single token and this works great for our
> purposes.
>
> However, from everything I've seen of X3, I'm not sure how to integrate it with
> Spirit.Lex.  Can we continue to use the `token` and `raw_token` terminals with an X3
> rule?  Are there replacements for them instead?
>
> Given the apparent speed of X3 in parsing, is using a separate lexer even a worthwhile
> endeavor?  Should I instead use a parser semantic action to accomplish the same task as
> what is currently being done in the lexer?

There are no replacements for the lexer and I doubt X3 will ever have one,
unless someone contributes some time and effort. I personally don't really use
them, instead preferring on pure Qi. If you do it right, A pure Qi parser can
be at par with or even outperform one with a lexer (anecdotal evidence only,
not fully substantiated).

The real advantage of X3 over QI is 1) in compile time and 2) in AST building.
Parsing should be more or less the same. But, again in my experience, the
most time consuming operations are in AST building, not parsing and not
lexing.

> Apologies in advance to the onslaught of questions and thanks in advance for any
> insights.  I'd love to migrate our parser to X3 as it seems to provide many benefits over Qi.

Again, pardon the delay in replying.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Boost Spirit X3 and lexers

Peter Huene
On Sun, Oct 25, 2015 at 5:20 PM, Joel de Guzman <[hidden email]> wrote:
Hi Peter,

Pardon the delay. I almost missed this post.

No worries at all; thanks for the response!

I ended up porting the AST and parser to X3, but kept the Spirit.Lex lexer since it works for our purposes.  I've implemented my own semantically-equivalent "token" and "raw_token" parsers in X3 as it was so much simpler to implement vs. how custom parsers were written in Qi.  Porting to X3 has allowed me to simplify my grammar and free it of all semantic actions (something I should have done a long time ago).  I've also figured out how to better format expectation failure messages with partial specializations of get_info, along with providing parse context allowing me to annotate the AST with back pointers to the root (the root stores some useful information for error messages when evaluating the expressions later on).

I'm really pleased with how it has turned out in terms of compilation time and, once the porting work fully completes, I'll be able to benchmark the runtime performance.

Keep up the excellent work!

Cheers,
Peter
 

On 10/20/15 1:55 AM, Peter Huene wrote:
> Hello all,
>
> I am considering migrating a parser that makes use of Spirit.Lex and Spirit.Qi to X3.
>
> We're currently using a static lexer that has to handle what our language calls
> "heredocs", which are string literals that appear on lines following the current one, up
> to some "end of string" identifier.
>
> Here's an example in our language:
>
> ```
> function(@(FIRST), @(SECOND))
>    This is
>    all part of the first string
>    |- FIRST
>    This is
>    all part of the second string
>    |- SECOND
> # Here is where lexing continues after newline following close paren token
> ```
>
> This input should tokenize as: identifier, open paren, string literal, comma, string
> literal, close paren.
>
> To accomplish this, a custom input iterator to the lexer is used that is capable of
> remembering where to "skip to" when a new line character is encountered.  A lexer semantic
> action is used to tokenize the heredoc into a single token and this works great for our
> purposes.
>
> However, from everything I've seen of X3, I'm not sure how to integrate it with
> Spirit.Lex.  Can we continue to use the `token` and `raw_token` terminals with an X3
> rule?  Are there replacements for them instead?
>
> Given the apparent speed of X3 in parsing, is using a separate lexer even a worthwhile
> endeavor?  Should I instead use a parser semantic action to accomplish the same task as
> what is currently being done in the lexer?

There are no replacements for the lexer and I doubt X3 will ever have one,
unless someone contributes some time and effort. I personally don't really use
them, instead preferring on pure Qi. If you do it right, A pure Qi parser can
be at par with or even outperform one with a lexer (anecdotal evidence only,
not fully substantiated).

The real advantage of X3 over QI is 1) in compile time and 2) in AST building.
Parsing should be more or less the same. But, again in my experience, the
most time consuming operations are in AST building, not parsing and not
lexing.

> Apologies in advance to the onslaught of questions and thanks in advance for any
> insights.  I'd love to migrate our parser to X3 as it seems to provide many benefits over Qi.

Again, pardon the delay in replying.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general


------------------------------------------------------------------------------

_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Boost Spirit X3 and lexers

Joel de Guzman
On 10/26/15 9:16 AM, Peter Huene wrote:

> On Sun, Oct 25, 2015 at 5:20 PM, Joel de Guzman <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Peter,
>
>     Pardon the delay. I almost missed this post.
>
>
> No worries at all; thanks for the response!
>
> I ended up porting the AST and parser to X3, but kept the Spirit.Lex lexer since it works
> for our purposes.  I've implemented my own semantically-equivalent "token" and "raw_token"
> parsers in X3 as it was so much simpler to implement vs. how custom parsers were written
> in Qi.  Porting to X3 has allowed me to simplify my grammar and free it of all semantic
> actions (something I should have done a long time ago).  I've also figured out how to
> better format expectation failure messages with partial specializations of get_info, along
> with providing parse context allowing me to annotate the AST with back pointers to the
> root (the root stores some useful information for error messages when evaluating the
> expressions later on).
>
> I'm really pleased with how it has turned out in terms of compilation time and, once the
> porting work fully completes, I'll be able to benchmark the runtime performance.
>
> Keep up the excellent work!

Wonderful! If you can provide a narrative of what you've done, I'd love
to publish them. Or if you have a blog, I can just link to it. It would
be a good use-case that would help others who want to do something similar.

Regards,
--
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/


------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Boost Spirit X3 and lexers

sehe
On 10/26/2015 04:05 AM, Joel de Guzman wrote:
> Wonderful! If you can provide a narrative of what you've done, I'd love
> to publish them. Or if you have a blog, I can just link to it. It would
> be a good use-case that would help others who want to do something similar.
+1 for that, that sounds really interesting, Peter

------------------------------------------------------------------------------
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general