Upgrading the lexer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Upgrading the lexer

Boost - Build mailing list
AMDG

I've implemented a new lexer that handles whitespace
more intelligently.
See https://github.com/boostorg/build/tree/scanner-upgrade

Example:
import testing;
rule mytest(sources*:requirements*)
{
  sources+=[glob x.cpp];
  requirements+=<link>shared:<define>YY;
  run $(sources):::$(requirements);
}
mytest test.cpp;

Details:

The following symbols are always their own tokens
when not quoted or escaped:
'{', '}', ';'

These symbols are independent tokens in contexts where
the grammar allows them:
'<', '>', '>=', '<=', '=', '[', ']', '*', '+', '?', '+=', '?=', ':'

Spaces will not break tokens inside variables expansion
like $(x:J= ).  This is not a breaking change because it
currently causes a hard error.

In order to reduce the amount of breakage I've
also added the following special rules:
- A ':' is not a keyword when it appears in a token
  which appears to be either a conditional property
  like <link>shared:<define>X_DLL or a windows absolute
  path like C:\\Users
- A '>' is not a keyword if it closes a matching '<',
  to allow uses like:
  if <link>shared in $(properties)

The majority of issues appear in regular expressions
which must be quoted in most cases:
WRONG: [ MATCH ([.]) : $(x) ]
RIGHT: [ MATCH "([.])" : $(x) ]
I don't want to work around this, because it's too
ambiguous and unlike conditional properties it
appears relatively rarely in Jamfiles.

This is a major breaking change, so I'm planning to
split into three steps:

Step 1. Issue a warning for all tokens that will be handled differently.
Step 2. Turn the warning into an error.
Step 3. Enable the new lexer.

The scanner-upgrade branch is currently set to step 1.

Thoughts?

In Christ,
Steven Watanabe
_______________________________________________
Unsubscribe & other changes: https://lists.boost.org/mailman/listinfo.cgi/boost-build
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading the lexer

Boost - Build mailing list
Hi Steven,

As the whitespace handling in the bjam language is AFAICT a major
source of denigration, this looks like a tremendous improvement.
Your transition steps look reasonnable. Would you mind advertising
under http://www.boost.org/build/index.html when the scanner-upgrade
branch is merged ?
Preferably in Latest release, but sadly it seems to be unmaintained.

Incidentaly, while talking about major sources of denigration, and
while you are at the lexer, I routinely hear people complaining about
the way arguments are passed to rules.
You have to know by heart that 'requirements' is the third arguments
of the 'lib' rules, and that 'usage-requirements' is the fifth, etc.
However, 'project' has its own way to receive named arguments.
Moreover, Bazel, a new build system from Google that really looks like
Bjam, uses named arguments for their equivalent 'lib', 'exe' rules. I
have seen this perceived as a real advantage.

I was wondering if it were possible to extend the bjam grammar so that:

rule lib ( name : sources * : requirements * )

can be called either (possibly using named arguments coming from the
rule definition):
lib foo : a.C b.C : requirements <define>XXX ;
lib foo : sources a.C b.C : requirements <define>XXX ;
lib foo : requirements <define>XXX : sources a.C b.C ;
lib sources a.C b.C : name foo : requirements <define>XXX ;

What do you think ?

Cheers,

--
Fabien

2018-01-13 23:57 GMT+01:00 Steven Watanabe via Boost-build
<[hidden email]>:

> AMDG
>
> I've implemented a new lexer that handles whitespace
> more intelligently.
> See https://github.com/boostorg/build/tree/scanner-upgrade
>
> Example:
> import testing;
> rule mytest(sources*:requirements*)
> {
>   sources+=[glob x.cpp];
>   requirements+=<link>shared:<define>YY;
>   run $(sources):::$(requirements);
> }
> mytest test.cpp;
>
> Details:
>
> The following symbols are always their own tokens
> when not quoted or escaped:
> '{', '}', ';'
>
> These symbols are independent tokens in contexts where
> the grammar allows them:
> '<', '>', '>=', '<=', '=', '[', ']', '*', '+', '?', '+=', '?=', ':'
>
> Spaces will not break tokens inside variables expansion
> like $(x:J= ).  This is not a breaking change because it
> currently causes a hard error.
>
> In order to reduce the amount of breakage I've
> also added the following special rules:
> - A ':' is not a keyword when it appears in a token
>   which appears to be either a conditional property
>   like <link>shared:<define>X_DLL or a windows absolute
>   path like C:\\Users
> - A '>' is not a keyword if it closes a matching '<',
>   to allow uses like:
>   if <link>shared in $(properties)
>
> The majority of issues appear in regular expressions
> which must be quoted in most cases:
> WRONG: [ MATCH ([.]) : $(x) ]
> RIGHT: [ MATCH "([.])" : $(x) ]
> I don't want to work around this, because it's too
> ambiguous and unlike conditional properties it
> appears relatively rarely in Jamfiles.
>
> This is a major breaking change, so I'm planning to
> split into three steps:
>
> Step 1. Issue a warning for all tokens that will be handled differently.
> Step 2. Turn the warning into an error.
> Step 3. Enable the new lexer.
>
> The scanner-upgrade branch is currently set to step 1.
>
> Thoughts?
>
> In Christ,
> Steven Watanabe
> _______________________________________________
> Unsubscribe & other changes: https://lists.boost.org/mailman/listinfo.cgi/boost-build
_______________________________________________
Unsubscribe & other changes: https://lists.boost.org/mailman/listinfo.cgi/boost-build
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading the lexer

Boost - Build mailing list
On Fri, Feb 9, 2018 at 7:07 AM, Fabien Chêne via Boost-build <[hidden email]> wrote:

I was wondering if it were possible to extend the bjam grammar so that:

rule lib ( name : sources * : requirements * )

can be called either (possibly using named arguments coming from the
rule definition):
lib foo : a.C b.C : requirements <define>XXX ;
lib foo : sources a.C b.C : requirements <define>XXX ;
lib foo : requirements <define>XXX : sources a.C b.C ;
lib sources a.C b.C : name foo : requirements <define>XXX ;

What do you think ?

IIRC we discussed this some long time ago.. You example usage is not "easy" to accomplish in a backward compatible way. The best alternative that I remember was having the argument names be a special format. For example:

lib foo : a.C b.C :requirements <define>XXX ;
lib foo :sources a.C b.C :requirements <define>XXX ;
lib foo :requirements <define>XXX :sources a.C b.C ;
lib :sources a.C b.C :name foo :requirements <define>XXX ;

That is having named arguments be ":<arg-name> arg .." (colon prefixed). Which would allow to distinguish them and the list value from all the other arguments. How does that sound to you?

--
-- Rene Rivera
-- Grafik - Don't Assume Anything
-- Robot Dreams - http://robot-dreams.net


_______________________________________________
Unsubscribe & other changes: https://lists.boost.org/mailman/listinfo.cgi/boost-build
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading the lexer

Boost - Build mailing list
AMDG

On 02/09/2018 06:25 AM, Rene Rivera via Boost-build wrote:

> On Fri, Feb 9, 2018 at 7:07 AM, Fabien Chêne via Boost-build <
> [hidden email]> wrote:
>
>>
>> I was wondering if it were possible to extend the bjam grammar so that:
>>
>> rule lib ( name : sources * : requirements * )
>>
>> can be called either (possibly using named arguments coming from the
>> rule definition):
>> lib foo : a.C b.C : requirements <define>XXX ;
>> lib foo : sources a.C b.C : requirements <define>XXX ;
>> lib foo : requirements <define>XXX : sources a.C b.C ;
>> lib sources a.C b.C : name foo : requirements <define>XXX ;
>>
>> What do you think ?
>>
>

  I already implemented this in develop.  It is implemented
as a library extension rather than a language extension,
so rules must explicitly make use of it:

param.handle-named-params sources requirements
  default-build usage-requirements ;

I applied it to all main-target rules.

> IIRC we discussed this some long time ago.. You example usage is not "easy"
> to accomplish in a backward compatible way.

  It's true that it isn't perfectly backwards compatible,
but I didn't find any conflicts in Boost.  Also, applying
it on a per-rule basis is much safer than baking it into
the language.

> The best alternative that I
> remember was having the argument names be a special format. For example:
>
> lib foo : a.C b.C :requirements <define>XXX ;
> lib foo :sources a.C b.C :requirements <define>XXX ;
> lib foo :requirements <define>XXX :sources a.C b.C ;
> lib :sources a.C b.C :name foo :requirements <define>XXX ;
>
> That is having named arguments be ":<arg-name> arg .." (colon prefixed).
> Which would allow to distinguish them and the list value from all the other
> arguments. How does that sound to you?
>

That syntax conflicts with changing the lexer.

In Christ,
Steven Watanabe
_______________________________________________
Unsubscribe & other changes: https://lists.boost.org/mailman/listinfo.cgi/boost-build
Reply | Threaded
Open this post in threaded view
|

Re: Upgrading the lexer

Boost - Build mailing list
On Fri, Feb 9, 2018 at 10:05 AM, Steven Watanabe via Boost-build <[hidden email]> wrote:
AMDG

On 02/09/2018 06:25 AM, Rene Rivera via Boost-build wrote:
> On Fri, Feb 9, 2018 at 7:07 AM, Fabien Chêne via Boost-build <
> [hidden email]> wrote:
>
>>
>> I was wondering if it were possible to extend the bjam grammar so that:
>>
>> rule lib ( name : sources * : requirements * )
>>
>> can be called either (possibly using named arguments coming from the
>> rule definition):
>> lib foo : a.C b.C : requirements <define>XXX ;
>> lib foo : sources a.C b.C : requirements <define>XXX ;
>> lib foo : requirements <define>XXX : sources a.C b.C ;
>> lib sources a.C b.C : name foo : requirements <define>XXX ;
>>
>> What do you think ?
>>
>

  I already implemented this in develop.  It is implemented
as a library extension rather than a language extension,
so rules must explicitly make use of it:

param.handle-named-params sources requirements
  default-build usage-requirements ;

I applied it to all main-target rules.

> IIRC we discussed this some long time ago.. You example usage is not "easy"
> to accomplish in a backward compatible way.

  It's true that it isn't perfectly backwards compatible,
but I didn't find any conflicts in Boost.  Also, applying
it on a per-rule basis is much safer than baking it into
the language.

That's a reasonable solution too. 

> The best alternative that I
> remember was having the argument names be a special format. For example:
>
> lib foo : a.C b.C :requirements <define>XXX ;
> lib foo :sources a.C b.C :requirements <define>XXX ;
> lib foo :requirements <define>XXX :sources a.C b.C ;
> lib :sources a.C b.C :name foo :requirements <define>XXX ;
>
> That is having named arguments be ":<arg-name> arg .." (colon prefixed).
> Which would allow to distinguish them and the list value from all the other
> arguments. How does that sound to you?
>

That syntax conflicts with changing the lexer.

Good point.. Although it would be nice to have some syntax that would apply at the language level. As it would be useful in general coding itself.

--
-- Rene Rivera
-- Grafik - Don't Assume Anything
-- Robot Dreams - http://robot-dreams.net


_______________________________________________
Unsubscribe & other changes: https://lists.boost.org/mailman/listinfo.cgi/boost-build