New feature in Spirit.Lex

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

New feature in Spirit.Lex

Julien Peeters
Hi readers,

Sorry for people who will receive this email twice, but I didn't know
where is the best place to post this email.

Yesturday I've started using Spirit.Lex for building a simple
source-code comment separator. For a given source file, the scanner
would seperate code from comments matching a special start pattern (here
"/*>"). Previously I wrote it using OCamllex, which is a very brillant
lexer from the OCaml toolset.

Indeed, writing such a comment separator with OCamllex is very easy
thanks to a specific feature that I would call "multi-context" lexer (I
don't know the real name of this feature). But it is quiet hard to
implement is using Spirit, at least using my current knowlegde of it. As
much I understand, implementing it using Spirit would required to use a
state variable to select the context in witch the lexer is in when it
consumes some token. However, this not prevent from matching (and
consuming) tokens that do not belong to the context.

Finally, I aim to use Spirit.Lex because OCamllex has some limitations.
In particular, it does not allow dynamic addition of patterns, like Flex
if I am not wrong. Then, one lexer must be generated for each comment
patterns (e.g. when adding support for other languages than C/C++).

So as to illustrate my words, here is a code sample of what I did using


   type expr = Code of string | SpecialComment of string

   let buffer = Buffer.create 32

let blank        = [' ' '\013' '\009' '\012']
let nl           = ['\010']
let not_nl       = [^ '\010']
let blank_nl     = [' ' '\013' '\009' '\012' '\010']
let not_blank_nl = [^ ' ' '\013' '\009' '\012' '\010']

rule main state = parse
   | blank* "/*>" blank_nl* "*/"
         main state lexbuf

   | blank* "/*>" blank_nl*
         let state2 =
           if Buffer.length buffer > 0 then begin
             let code = Code (Buffer.contents buffer) in
             let _ = Buffer.reset buffer in
               code :: state
           end else
         let state3 = (special_comment lexbuf) :: state2 in
           main state3 lexbuf

   | _ as c
         Buffer.add_char buffer c;
         main state lexbuf

   | eof
         if Buffer.length buffer > 0 then
           let code = Buffer.contents buffer in
             Buffer.reset buffer;
             (Code code)::state

and special_comment = parse
   | "*/" nl+
         let com = SpecialComment (Buffer.contents buffer) in
           Buffer.reset buffer;

   | nl nl+ blank_nl*
         Buffer.add_string buffer "\n\n";
         special_comment lexbuf

   | blank_nl+
         Buffer.add_char buffer ' ';
         special_comment lexbuf

   | not_blank_nl as c
         Buffer.add_char buffer c;
         special_comment lexbuf

   let main () =
     let lexbuf = Lexing.from_channel stdin in
     let result = main [] lexbuf in
         ( fun item ->
             match item with
               | Code str -> Printf.fprintf stdout "<code>%s</code>" str
               | SpecialComment str -> Printf.fprintf stderr
"<special>%s</special>" str )

   let _ = main ()


There are two rules: main and special_comment. The lexer starts in the
main rule when it is first called. When the pattern "/*>" is matched,
the lexer switches in the special_comment rule and returns when the
pattern "*/" is matched.

The purpose of this post is to ask to the Spirit developers if the
current implementation allows such a "multi-context" construct. If it's
the case I would appreciate any help for that. If it's not, I would
consider starting and/or participating to the implementation of this
feature if you consider it's relevant. However, I would need some help
at the beginning because I am new to Spirit.Lex internals.

In terms of implementation, I think it is like building as many macro
states as there are rule in the semantic of Ocamllex. Each macro state
is itself a state machine in the semantic of Spirit.Lex. Then, semantic
actions of the lexer can allow to jump (or switch) between macro states.

Feel free to ask any question if I was not clear enough or if you are
not familiar with OCaml/OCamllex syntax ans semantics.

Best regards,


BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts.
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
Spirit-devel mailing list
[hidden email]