Compile-Time parser

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Compile-Time parser

Felipe Magno de Almeida
I've readed in this mail list about an idea of a compile-time parser.
Turned out the idea wasnt exactly what I was starting to find a good
idea.
I'm not so sure about where would someone need a compile-time parser,
but anyway, it seems to be possible, and interesting,
I think it could use the preprocessor(boost.preprocessor), mpl and
BOOST_TYPEOF for that.

I think it could be possible to write almost exactly like an spirit
grammar and have it work on compile-time.

I was thinking something like this:

template <typename T>
T helper(T)
{}

// define some operators that uses "expression templates"(but, no need
to have state in the classes at all...

#define PARSER(x) BOOST_TYPEOF(helper(x))::type

typedef PARSER(( numbers >> ch_p<'a'> )) parser;

And have this parser matches(or not) an C-string using a macro to
convert the C-string in an mpl::vector_c that could be traversed in
compile-time.
this way it could be possible to create compile-time bools that could
be used in SFINAE and etc like this:

MATCHPARSER(parser, "asd")

Opinions? Is it really possible or am I missing something? Is it useful?
Worth the trouble?

Thanks in advance,
--
   Felipe Magno de Almeida
Developer from synergy and Computer Science student from State
University of Campinas(UNICAMP).
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
"There is no dark side of the moon really. Matter of fact it's all dark."


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Compile-Time parser

Seyed H. HAERI (Hossein)
Felipe, (I'm not sure whether I'm using the correct name.)

> I've readed in this mail list about an idea of a compile-time parser.
> Turned out the idea wasnt exactly what I was starting to find a good
> idea.

Thank you for your interest. (The issue has been started by me in fact.)

> I'm not so sure about where would someone need a compile-time parser,
> but anyway, it seems to be possible, and interesting,
> I think it could use the preprocessor(boost.preprocessor), mpl and
> BOOST_TYPEOF for that.
>
> I think it could be possible to write almost exactly like an spirit
> grammar and have it work on compile-time.
>
> I was thinking something like this:
>
> template <typename T>
> T helper(T)
> {}
>
> // define some operators that uses "expression templates"(but, no need
> to have state in the classes at all...
>
> #define PARSER(x) BOOST_TYPEOF(helper(x))::type
>
> typedef PARSER(( numbers >> ch_p<'a'> )) parser;

If I've understood correctly, you're still using operator (>>) overloading
which is (mainly) a run-time construct. AFAI'm aware, most of the
information needed for operator overloading is not present until run-time.
There are very few of them present at compile-time...

You're leading me into new ideas, however... The "," operator is ready at
compile-time and can not be overloaded. It is also easy to use the boost
stuff which gives us comma-separated stuff... (Speaking off-the-top of my
head.)

> And have this parser matches(or not) an C-string using a macro to
> convert the C-string in an mpl::vector_c that could be traversed in
> compile-time.
> this way it could be possible to create compile-time bools that could
> be used in SFINAE and etc like this:
>
> MATCHPARSER(parser, "asd")
>
> Opinions? Is it really possible or am I missing something?

Don't know in fact. Be it possible even, it's just part of what I
originally named a compile-time parser. It, furthermore, doesn't solve my
problem. At least, I can't see how.

> Is it useful? Worth the trouble?

Sure. Consider my application for example.

All of the Best,
--Hossein

P.S. I hope this is the first posting of you on the topic. If not, please
re-send them to me as I seem to have deleted them.


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Compile-Time parser

Felipe Magno de Almeida
On 9/9/05, Seyed H. HAERI (Hossein) <[hidden email]> wrote:
> Felipe, (I'm not sure whether I'm using the correct name.)

Is correct :P

>
> > I've readed in this mail list about an idea of a compile-time parser.
> > Turned out the idea wasnt exactly what I was starting to find a good
> > idea.
>
> Thank you for your interest. (The issue has been started by me in fact.)

Probably I should start another thread, I dont think your problem can
really be solved, but, IMO, the compile-time idea can solve other
problems...

>

[snip]

>
> If I've understood correctly, you're still using operator (>>) overloading
> which is (mainly) a run-time construct. AFAI'm aware, most of the
> information needed for operator overloading is not present until run-time.
> There are very few of them present at compile-time...

The operators are only to look good and use the type that the operator
returns, something like the sizeof trick, but using the BOOST_TYPEOF
that returns the type of the expression(like __typeof__ of gcc, but
works for MSVC too, and others through emulation). That would be
essentially in compile-time.

>
> You're leading me into new ideas, however... The "," operator is ready at
> compile-time and can not be overloaded. It is also easy to use the boost
> stuff which gives us comma-separated stuff... (Speaking off-the-top of my
> head.)

In fact the ',' operator can be overloaded(see boost.assign).

>
> > And have this parser matches(or not) an C-string using a macro to
> > convert the C-string in an mpl::vector_c that could be traversed in
> > compile-time.
> > this way it could be possible to create compile-time bools that could
> > be used in SFINAE and etc like this:
> >
> > MATCHPARSER(parser, "asd")
> >
> > Opinions? Is it really possible or am I missing something?
>
> Don't know in fact. Be it possible even, it's just part of what I
> originally named a compile-time parser. It, furthermore, doesn't solve my
> problem. At least, I can't see how.

I dont know if it is really possible to solve your problem, since I
think the compile-time parser should parse C-strings in compile-time
instead of classes. But maybe it is possible for you to rewrite your
pattern to use C-strings instead of macros and use the compile-time
parser that I'm suggesting here. That's essentialy different from what
you suggested. I just used the "compile-time" idea.

>
> > Is it useful? Worth the trouble?
>
> Sure. Consider my application for example.
>
> All of the Best,
> --Hossein
>
> P.S. I hope this is the first posting of you on the topic. If not, please
> re-send them to me as I seem to have deleted them.
>

Yes, it is.

Thanks,
--
   Felipe Magno de Almeida
Developer from synergy and Computer Science student from State
University of Campinas(UNICAMP).
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
"There is no dark side of the moon really. Matter of fact it's all dark."


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Slow parsing times

Albert Wessel
All,

Short question. (?? long answer ??)

Situation: created an ast_tree generating parser for a scripting language.
It's slow in creating the ast_tree, and if I say slow, I mean
slooooooooooow. 7 seconds for a 4,5 kb file. 22 seconds for the attached
testfile.txt of 8 kb. (it also seems its exponential in parsing times..)

What am I doing wrong here? I've been messing around with this for more than
a month now and I can't seem to find the answer. At first I thought it's the
amount of rules I'm using, so I decided to remove all rules and just create
a simple parser for a comma and point separated list. No luck, same parsing
speed. Also, using a different ast_node_factory didn't help.

I _MUST_ be doing something fundamentally wrong here.

If someone could have a look at the prototype code and tell me which error
I'm making here. Btw; is not the parser for the scripting stuff, it's just
to illustrate my issue. It parses a comma or point separated list. It takes
around 22 seconds to parse the 8.5kb testfile.txt (!) on my P4 2,8Ghz 1GB
ram. I'm using MSVC .Net 2003, 7.1 compiler, boost 1.32 . And again, it
makes no difference at all how many rules there are in the grammar or which
node factory I'm using (or whether or not using an error reporter), it just
is slow.

(btw: I basically copied a lot of stuff from the examples in spirit if
anything looks familiar to you guys)

I'd really appreciate any usefull comments.

Regards,
Albert Wessel






testfile.txt (8K) Download Attachment
parse_invoke_snippet.cpp (600 bytes) Download Attachment
st_spirit.cpp (98 bytes) Download Attachment
st_spirit.h (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Slow parsing times

Joel de Guzman-2
Albert Wessel wrote:

> All,
>
> Short question. (?? long answer ??)
>
> Situation: created an ast_tree generating parser for a scripting language.
> It's slow in creating the ast_tree, and if I say slow, I mean
> slooooooooooow. 7 seconds for a 4,5 kb file. 22 seconds for the attached
> testfile.txt of 8 kb. (it also seems its exponential in parsing times..)
>
> What am I doing wrong here? I've been messing around with this for more than
> a month now and I can't seem to find the answer. At first I thought it's the
> amount of rules I'm using, so I decided to remove all rules and just create
> a simple parser for a comma and point separated list. No luck, same parsing
> speed. Also, using a different ast_node_factory didn't help.
>
> I _MUST_ be doing something fundamentally wrong here.
>
> If someone could have a look at the prototype code and tell me which error
> I'm making here. Btw; is not the parser for the scripting stuff, it's just
> to illustrate my issue. It parses a comma or point separated list. It takes
> around 22 seconds to parse the 8.5kb testfile.txt (!) on my P4 2,8Ghz 1GB
> ram. I'm using MSVC .Net 2003, 7.1 compiler, boost 1.32 . And again, it
> makes no difference at all how many rules there are in the grammar or which
> node factory I'm using (or whether or not using an error reporter), it just
> is slow.
>
> (btw: I basically copied a lot of stuff from the examples in spirit if
> anything looks familiar to you guys)
>
> I'd really appreciate any usefull comments.

Could you please post a main driver that I can use out of the box?
Also, have you timed the parse time of the non-AST based spirit
code? I'd be interested in seeing the difference.

Cheers,
--
Joel de Guzman
http://www.boost-consulting.com
http://spirit.sf.net



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Slow parsing times

Carl Barron
In reply to this post by Albert Wessel

On Sep 9, 2005, at 5:10 PM, Albert Wessel wrote:

> All,
>
> Short question. (?? long answer ??)
>
> Situation: created an ast_tree generating parser for a scripting
> language.
> It's slow in creating the ast_tree, and if I say slow, I mean
> slooooooooooow. 7 seconds for a 4,5 kb file. 22 seconds for the
> attached
> testfile.txt of 8 kb. (it also seems its exponential in parsing
> times..)
>
> What am I doing wrong here? I've been messing around with this for
> more than
> a month now and I can't seem to find the answer. At first I thought
> it's the
> amount of rules I'm using, so I decided to remove all rules and just
> create
> a simple parser for a comma and point separated list. No luck, same
> parsing
> speed. Also, using a different ast_node_factory didn't help.
>
> I _MUST_ be doing something fundamentally wrong here.
>
> If someone could have a look at the prototype code and tell me which
> error
> I'm making here. Btw; is not the parser for the scripting stuff, it's
> just
> to illustrate my issue. It parses a comma or point separated list. It
> takes
> around 22 seconds to parse the 8.5kb testfile.txt (!) on my P4 2,8Ghz
> 1GB
> ram. I'm using MSVC .Net 2003, 7.1 compiler, boost 1.32 . And again, it
> makes no difference at all how many rules there are in the grammar or
> which
> node factory I'm using (or whether or not using an error reporter), it
> just
> is slow.
>
> (btw: I basically copied a lot of stuff from the examples in spirit if
> anything looks familiar to you guys)
>
> I'd really appreciate any usefull comments.
>
> Regards,
> Albert Wessel
>
    If I find that ast_parse is ASTRONOMICALLY SLOWER than parse,
I used a computer generated string literal from the testdata.txt file
to remove
any concern of file io. and used this grammar which similiar to what
you are
testing,

USE_TREES is used so the same file with or without tree generation,
without defining it it is instantaneous to the eye in execution time,
when
USE_TREES is defined I can rebuild the project [with CW 9.6/macosx]
and still wait for the results.   A definite difference between the two
builds.
input is the same and only the parsing function differs in the driver.
And
the only differences in the grammar are below, very interesting  indeed,


namespace SP=boost::spirit;

struct list_g:SP::grammar<list_g>
{
        char sep;
        list_g(char a):sep(a){}
       
        template <class Scan>
        struct definition
        {
                definition(const list_g &s)
                {
                        delim = SP::ch_p(s.sep);
                       
                        item =
#ifdef USE_TREES
                                SP::token_node_d[+SP::alnum_p]
#else
                                SP::lexeme_d[+SP::alnum_p]
#endif
                                ;
                               
                        list = item
#ifdef USE_TREES
                                >> !(*(SP::root_node_d[delim] >> item )
#else
                                >> !(*(delim >> item)
#endif
                                >> delim
                                        )
                                ;
                }
                SP::rule<Scan> delim,item,list;
                SP::rule<Scan> const & start() const
                { return list;}
        };
};

struct comma_list_g:list_g
{
        comma_list_g():list_g(','){}
};



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Compile-Time parser -- Revisited

Seyed H. HAERI (Hossein)
In reply to this post by Felipe Magno de Almeida
Flipe,

> Probably I should start another thread, I dont think your problem can
> really be solved, but, IMO, the compile-time idea can solve other
> problems...

I did that for you ;)

> The operators are only to look good and use the type that the operator
> returns, something like the sizeof trick, but using the BOOST_TYPEOF
> that returns the type of the expression(like __typeof__ of gcc, but
> works for MSVC too, and others through emulation). That would be
> essentially in compile-time.

I see.

>> You're leading me into new ideas, however... The "," operator is ready
>> at
>> compile-time and can not be overloaded. It is also easy to use the boost
>> stuff which gives us comma-separated stuff... (Speaking off-the-top of
>> my
>> head.)
>
> In fact the ',' operator can be overloaded(see boost.assign).

Oops! Thanks! A fantstic corner of boost I wasn't aware so far.

>> Don't know in fact. Be it possible even, it's just part of what I
>> originally named a compile-time parser. It, furthermore, doesn't solve
>> my
>> problem. At least, I can't see how.
>
> I dont know if it is really possible to solve your problem, since I
> think the compile-time parser should parse C-strings in compile-time
> instead of classes. But maybe it is possible for you to rewrite your
> pattern to use C-strings instead of macros and use the compile-time
> parser that I'm suggesting here. That's essentialy different from what
> you suggested. I just used the "compile-time" idea.

It actually is not! It is always to stringise all the things, isn't it?

TTFN,
--Hossein


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Albert Wessel
In reply to this post by Joel de Guzman-2

----- Original Message -----
From: "Joel de Guzman" <[hidden email]>
To: <[hidden email]>
Sent: Saturday, September 10, 2005 12:43 AM
Subject: [Spirit-general] Re: Slow parsing times


> Albert Wessel wrote:
> > All,
> >
> > Short question. (?? long answer ??)
> >
> > Situation: created an ast_tree generating parser for a scripting
language.
> > It's slow in creating the ast_tree, and if I say slow, I mean
> > slooooooooooow. 7 seconds for a 4,5 kb file. 22 seconds for the attached
> > testfile.txt of 8 kb. (it also seems its exponential in parsing times..)
> >
>
> Could you please post a main driver that I can use out of the box?
> Also, have you timed the parse time of the non-AST based spirit
> code? I'd be interested in seeing the difference.

Ok, I have put a complete VC++ win32 console project on my webspace, if you
don't use VC++ then at least you can get the main driver from there and
compile it in your environment .

Also, I included the same parser without ast parse tree, and it's lightening
fast (we're speaking of a fact 100 here!!, 32ms again 3200ms for the
testfile included in the project!!!!) I understand that generating a parse
tree creates overhead, but not a factor 100!!

Click and download:

http://www.xs4all.nl/~awessel/speedtestconsole.rar

Any help appreciated a lot!

Regards,
Albert




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Dan Nuffer-2
Albert Wessel wrote:

> ----- Original Message -----
> From: "Joel de Guzman" <[hidden email]>
> To: <[hidden email]>
> Sent: Saturday, September 10, 2005 12:43 AM
> Subject: [Spirit-general] Re: Slow parsing times
>
>
>
>>Albert Wessel wrote:
>>
>>>All,
>>>
>>>Short question. (?? long answer ??)
>>>
>>>Situation: created an ast_tree generating parser for a scripting
>
> language.
>
>>>It's slow in creating the ast_tree, and if I say slow, I mean
>>>slooooooooooow. 7 seconds for a 4,5 kb file. 22 seconds for the attached
>>>testfile.txt of 8 kb. (it also seems its exponential in parsing times..)
>>>
>>
>>Could you please post a main driver that I can use out of the box?
>>Also, have you timed the parse time of the non-AST based spirit
>>code? I'd be interested in seeing the difference.
>
>
> Ok, I have put a complete VC++ win32 console project on my webspace, if you
> don't use VC++ then at least you can get the main driver from there and
> compile it in your environment .
>
> Also, I included the same parser without ast parse tree, and it's lightening
> fast (we're speaking of a fact 100 here!!, 32ms again 3200ms for the
> testfile included in the project!!!!) I understand that generating a parse
> tree creates overhead, but not a factor 100!!
>
> Click and download:
>
> http://www.xs4all.nl/~awessel/speedtestconsole.rar
>
> Any help appreciated a lot!
>

Use token_node_d (or no_node_d) to avoid creating a node for every
character. The memory allocation & construction for a node for every
char is costly.  Also if you have only one rule, all the nodes are
inserted into one vector and once you get a significant number, the
vector growth may also be significant.

Do you know how to use a profiler? gprof or kcachegrind are free and
work reasonably well. Run it on your program, then you can know exactly
what is using all the time. My bet is that most of the time is spent in new.

--
Dan Nuffer



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Stuart Dootson
On 9/11/05, Dan Nuffer <[hidden email]> wrote:

<snip>

> Do you know how to use a profiler? gprof or kcachegrind are free and
> work reasonably well. Run it on your program, then you can know exactly
> what is using all the time. My bet is that most of the time is spent in new.
>
> --
> Dan Nuffer

Or, as the OP is using VC 7.1, Compuware have a community edition of
their DevPartner Profiler that's a free download (see
http://www.compuware.com/products/devpartner/1969_ENG_HTML.htm) - that
also works well...

Stuart Dootson


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Carl Barron
In reply to this post by Dan Nuffer-2

On Sep 11, 2005, at 1:20 AM, Dan Nuffer wrote:

> Use token_node_d (or no_node_d) to avoid creating a node for every
> character. The memory allocation & construction for a node for every
> char is costly.  Also if you have only one rule, all the nodes are
> inserted into one vector and once you get a significant number, the
> vector growth may also be significant.
>
>
   using a short functor parser to handle +SP::alnum_p directly via
<cctype> and not copying any data
signifigantly shortens the run times, also using a final biold with
heavy inlining speeds the result by another
factor of 5, [ not profiled but timed[]  I used a C style string to
hold the testdata to hold the data instead of a file
read as well.
also used a node_iter_data_factory<> to build the tree.

This functor parser should do the same thing as lexeme_d[+alnum_p]
correct?

struct item_impl
{
        typedef boost::spirit::nil_t result_t;
        template <class Scan>
        std::ptrdiff_t operator () (const Scan &scan,result_t &)const
        {
                std::ptrdiff_t out = 0;
                while(!scan.at_end())
                {
                        unsigned char s = *scan;
                        if(!std::isalnum(s))
                                break;
                        ++scan;
                        ++out;
                }
                return out ?out:-1;
        }
};

typedef boost::spirit::functor_parser<item_impl> item_parser;





-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Albert Wessel
In reply to this post by Dan Nuffer-2
Dan Nuffer wrote:

>
> Use token_node_d (or no_node_d) to avoid creating a node for every
> character. The memory allocation & construction for a node for every
> char is costly.  Also if you have only one rule, all the nodes are
> inserted into one vector and once you get a significant number, the
> vector growth may also be significant.
>
> Do you know how to use a profiler? gprof or kcachegrind are free and
> work reasonably well. Run it on your program, then you can know exactly
> what is using all the time. My bet is that most of the time is spent in
new.

I've used the profiler suggested by Stuart.

Created a new testfile basically containing 1230 comma separated items with
each item of length 1  (i.e.: 1,2,3,4,5,1,2,3,4,5,1,2,3,4,5 etc...) Parsed
it with
an improved grammar (not creating a node for the comma, using token_node_d)
ran the program with a profiler as suggested.

"Method Name","% in Method","% with Children","Called","Average","Average
with Children","Real","Perceived"
"RtlFreeHeap","70,4","70,4","43.315","32,3","32,3","32,3","33,2"
"RtlAllocateHeap","27,1","27,1","45.793","11,8","11,8","11,8","12,7"
"HeapValidate","1,5","1,5","43.326","0,7","0,7","0,7","1,5"
"IsBadReadPtr","0,8","0,8","43.326","0,4","0,4","0,4","1,3"
"WriteFile","0,2","0,2","49","67,7","67,7","67,7","68,7"
"CreateFileA","0,0","0,0","1","103,0","103,0","103,0","104,1"
"GetFileType","0,0","0,0","4","16,2","16,2","16,2","17,1"
"ReadFile","0,0","0,0","2","30,8","30,8","30,8","31,7"
"HeapCreate","0,0","0,0","1","25,8","25,8","25,8","27,6"
"fopen","0,0","0,0","1","22,0","129,4","129,4","134,9"
"GetModuleFileNameA","0,0","0,0","1","19,1","19,1","19,1","19,9"

So indeed I think you're right, it's new and delete statements that causes
this to be slow. But I mean for creating 1230 nodes with one char in them is
it really necessary to call ++40000 free and alloc statements?

Just for my own interest I set a breakpoint on create_node of
node_iter_data_factory and stepped into the return node_t(first,last)
statement. I just did it briefly and did not investigate very deep but it
does a lot of stuff to generate one node with just a one character in it....

Then I played around with the node_val_factory  create_node and basically
just returned a node without anything in it, even then it takes 1 second to
parse a 8kb file v.s. 9 seconds with "filled" nodes with one character, -> 8
seconds for the extra overhead for copying one character per node?!?

I think the problem starts during the construction of the node_val_data
objects, and that is also where I get lost in the code. Any idea??

Albert










-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Slow parsing times

Carl Barron
In reply to this post by Albert Wessel

On Sep 9, 2005, at 5:10 PM, Albert Wessel wrote:

> All,
>
> Short question. (?? long answer ??)
>
> Situation: created an ast_tree generating parser for a scripting
> language.
> It's slow in creating the ast_tree, and if I say slow, I mean
> slooooooooooow. 7 seconds for a 4,5 kb file. 22 seconds for the
> attached
> testfile.txt of 8 kb. (it also seems its exponential in parsing
> times..)
>
> What am I doing wrong here? I've been messing around with this for
> more than
> a month now and I can't seem to find the answer. At first I thought
> it's the
> amount of rules I'm using, so I decided to remove all rules and just
> create
> a simple parser for a comma and point separated list. No luck, same
> parsing
> speed. Also, using a different ast_node_factory didn't help.
>
> I _MUST_ be doing something fundamentally wrong here.
>
> If someone could have a look at the prototype code and tell me which
> error
> I'm making here. Btw; is not the parser for the scripting stuff, it's
> just
> to illustrate my issue. It parses a comma or point separated list. It
> takes
> around 22 seconds to parse the 8.5kb testfile.txt (!) on my P4 2,8Ghz
> 1GB
> ram. I'm using MSVC .Net 2003, 7.1 compiler, boost 1.32 . And again, it
> makes no difference at all how many rules there are in the grammar or
> which
> node factory I'm using (or whether or not using an error reporter), it
> just
> is slow.
>
> (btw: I basically copied a lot of stuff from the examples in spirit if
> anything looks familiar to you guys)
>
> I'd really appreciate any usefull comments.
>
> Regards,
> Albert Wessel
>
    Interesing while this test shows pt_parse is slower than parse
ast_parse is extremely slower than either one typical results of the
following program produce these results on macosx /CW 9.6

using SP::parse takes 4 clock ticks and succceds
using SP::pt_parse takes 133 clock ticks and succceds
using SP::ast_parse takes 20109 clock ticks and succceds

the program follows used a macro to insure nothing changes in
the timing code but the function called.

see attached


driver2.cp (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Compile-Time parser -- Revisited

Felipe Magno de Almeida
In reply to this post by Seyed H. HAERI (Hossein)
On 9/10/05, Seyed H. HAERI (Hossein) <[hidden email]> wrote:
> Flipe,
>

Hi,

[snip]

>
> I did that for you ;)

Thanks,

>

[snip]

>
> It actually is not! It is always to stringise all the things, isn't it?

Well, I've made a little more research on this. It seems to only be
possible to have something like this:

typedef COMPILETIME_PARSER(ch_p<'a'>() >> ... the rest of the parser
.. )::type parser;

COMPILETIME_PARSE(parser, p a r s e r f o o d)::type::match // a constant bool

It is not possible to break arbitraty identifiers, that there's only
two solutions:
register the identifiers that could be used in the parser, or use all
that should be parsed separated with spaces. Registering words, IMO,
makes the parser have no meaning, since you can then, write your owns
macros. Using spaces makes the code just unreadable, IMO. So, it seems
it cant be really done...

Who knows someone finds a solution for this...

>
> TTFN,
> --Hossein
>

best regards,
--
   Felipe Magno de Almeida
Developer from synergy and Computer Science student from State
University of Campinas(UNICAMP).
Unicamp: http://www.ic.unicamp.br
Synergy: http://www.synergy.com.br
"There is no dark side of the moon really. Matter of fact it's all dark."


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Compile-Time parser -- Revisited

Seyed H. HAERI (Hossein)
Flipe,

>> It actually is not! It is always to stringise all the things, isn't it?
>
> Well, I've made a little more research on this. It seems to only be
> possible to have something like this:
>
> typedef COMPILETIME_PARSER(ch_p<'a'>() >> ... the rest of the parser
> .. )::type parser;
>
> COMPILETIME_PARSE(parser, p a r s e r f o o d)::type::match // a constant
> bool
>
> It is not possible to break arbitraty identifiers, that there's only
> two solutions:
> register the identifiers that could be used in the parser, or use all
> that should be parsed separated with spaces. Registering words, IMO,
> makes the parser have no meaning, since you can then, write your owns
> macros. Using spaces makes the code just unreadable, IMO. So, it seems
> it cant be really done...

Pursuig your discussion at the Boost mailing list, I'm unfortunately
inclined that this really is the case. :(

> Who knows someone finds a solution for this...

Hopefully! :)

Thanks for spending time on my issue.

All of the Best,
--Hossein


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Albert Wessel
In reply to this post by Dan Nuffer-2
Dan Nuffer wrote:
 >
> Use token_node_d (or no_node_d) to avoid creating a node for every
> character. The memory allocation & construction for a node for every
> char is costly.  Also if you have only one rule, all the nodes are
> inserted into one vector and once you get a significant number, the
> vector growth may also be significant.
>
> Do you know how to use a profiler? gprof or kcachegrind are free and
> work reasonably well. Run it on your program, then you can know exactly
> what is using all the time. My bet is that most of the time is spent in
new.
>
> --
> Dan Nuffer

I have been replying to your post before for my windows testing but now I've
tried this, in an ultimate attempt, on my Debian Linux box instead(gcc3.4.4,
boost_1_33)

Same result (no suprise): very slow. I have attached a tar file with all my
testing code in there (including make file and a target to compile for
gprof) I have been using gprof also and basically same result as on windows.
(a lof of activity on mt_alloc kind of calls, though in windows it's called
RtlAllocateHeap) So indeed a lot of time spent in new etc..

To my best knowledge my only conclusion can be this is a bug. Functionally
it works, but in practice not as it is too slow to parse. Can the Spirit
developers (Dan?) do something about this and take this up in the next
release? Or is it going to be left alone and do I have to resort to a
different solution? Or is there still something else that can be done?

Albert



Output of gprof flat profile

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  6.30      0.44     0.44  5357992     0.00     0.00
__gnu_cxx::__mt_alloc<boost::spirit::tree_node<boost::spirit::node_val_data<
boost::spirit::position_iterator<char*, boost:
:spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t> > > >::allocate(unsigned int, void co
nst*)
  4.92      0.78     0.34  5358078     0.00     0.00
__gnu_cxx::__mt_alloc<char>::allocate(unsigned int, void const*)
  3.76      1.04     0.26  5477689     0.00     0.00
boost::spirit::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t>::position_iterator(boost::spir
it::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t> const&)
  3.47      1.28     0.24  5358079     0.00     0.00
std::_Vector_base<char, std::allocator<char>
>::_Vector_impl::_Vector_impl(std::allocator<char> const&)
  3.04      1.49     0.21  5358079     0.00     0.00
std::_Vector_base<char, std::allocator<char> >::~_Vector_base()
  2.82      1.68     0.20  5354732     0.00     0.00  std::vector<char,
std::allocator<char> >::vector(std::vector<char, std::allocator<char> >
const&)
  2.61      1.86     0.18  5390781     0.00     0.00
std::_Vector_base<boost::spirit::tree_node<boost::spirit::node_val_data<boos
t::spirit::position_iterator<char*, boost::spi
rit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t> > >, std::allocator<boost::spirit::tree_n
ode<boost::spirit::node_val_data<boost::spirit::position_iterator<char*,
boost::spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spi
rit::file_position, boost::spirit::nil_t> > > >
>::_Vector_impl::_Vector_impl(std::allocator<boost::spirit::tree_node<boost:
:spirit::node_val_data<boost::spirit::position_itera
tor<char*, boost::spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t> > > > const&)
  2.46      2.03     0.17  5358078     0.00     0.00
__gnu_cxx::__mt_alloc<char>::deallocate(char*, unsigned int)
  2.32      2.19     0.16  5354732     0.00     0.00
boost::spirit::node_val_data<boost::spirit::position_iterator<char*,
boost::spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t>
>::node_val_data(boost::spirit::node_val_data<boost::spirit::position_iterat
or<char*
, boost::spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::file_position,
boost::spirit::nil_t> > const&)
  2.24      2.35     0.16  5351386     0.00     0.00
std::vector<boost::spirit::tree_node<boost::spirit::node_val_data<boost::spi
rit::position_iterator<char*, boost::spirit::f
ile_position, boost::spirit::nil_t>, boost::spirit::position_iterator<char*,
boost::spirit::file_position, boost::spirit::nil_t> > >,
std::allocator<boost::spirit::tree_node<bo
ost::spirit::node_val_data<boost::spirit::position_iterator<char*,
boost::spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::f
ile_position, boost::spirit::nil_t> > > >
>::vector(std::vector<boost::spirit::tree_node<boost::spirit::node_val_data<
boost::spirit::position_iterator<char*, boost::spirit::fil
e_position, boost::spirit::nil_t>, boost::spirit::position_iterator<char*,
boost::spirit::file_position, boost::spirit::nil_t> > >,
std::allocator<boost::spirit::tree_node<boos
t::spirit::node_val_data<boost::spirit::position_iterator<char*,
boost::spirit::file_position, boost::spirit::nil_t>,
boost::spirit::position_iterator<char*, boost::spirit::fil
e_position, boost::spirit::nil_t> > > > > const&)

speedtest.tar.gz (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Carl Barron

On Sep 15, 2005, at 3:53 PM, Albert Wessel wrote:

> Dan Nuffer wrote:
>>
>> Use token_node_d (or no_node_d) to avoid creating a node for every
>> character. The memory allocation & construction for a node for every
>> char is costly.  Also if you have only one rule, all the nodes are
>> inserted into one vector and once you get a significant number, the
>> vector growth may also be significant.
>>
>> Do you know how to use a profiler? gprof or kcachegrind are free and
>> work reasonably well. Run it on your program, then you can know
>> exactly
>> what is using all the time. My bet is that most of the time is spent
>> in
> new.
>>
>> --
>> Dan Nuffer
>
> I have been replying to your post before for my windows testing but
> now I've
> tried this, in an ultimate attempt, on my Debian Linux box
> instead(gcc3.4.4,
> boost_1_33)
>
> Same result (no suprise): very slow. I have attached a tar file with
> all my
> testing code in there (including make file and a target to compile for
> gprof) I have been using gprof also and basically same result as on
> windows.
> (a lof of activity on mt_alloc kind of calls, though in windows it's
> called
> RtlAllocateHeap) So indeed a lot of time spent in new etc..
>
> To my best knowledge my only conclusion can be this is a bug.
> Functionally
> it works, but in practice not as it is too slow to parse. Can the
> Spirit
> developers (Dan?) do something about this and take this up in the next
> release? Or is it going to be left alone and do I have to resort to a
> different solution? Or is there still something else that can be done?
>
> Albert
>
>
    The biggest gain is to remove the leaf node construction +alnum_p
from
the aitomatic tree construction since the end is a single node. via a
no node
scanner or a functor_parser for +alnum_p.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Carl Barron

On Sep 15, 2005, at 5:09 PM, Carl Barron wrote:

>
> On Sep 15, 2005, at 3:53 PM, Albert Wessel wrote:
>
>> Dan Nuffer wrote:
>>>
>>> Use token_node_d (or no_node_d) to avoid creating a node for every
>>> character. The memory allocation & construction for a node for every
>>> char is costly.  Also if you have only one rule, all the nodes are
>>> inserted into one vector and once you get a significant number, the
>>> vector growth may also be significant.
>>>
>>> Do you know how to use a profiler? gprof or kcachegrind are free and
>>> work reasonably well. Run it on your program, then you can know
>>> exactly
>>> what is using all the time. My bet is that most of the time is spent
>>> in
>> new.
>>>
>>> --
>>> Dan Nuffer
>>
>> I have been replying to your post before for my windows testing but
>> now I've
>> tried this, in an ultimate attempt, on my Debian Linux box
>> instead(gcc3.4.4,
>> boost_1_33)
>>
>> Same result (no suprise): very slow. I have attached a tar file with
>> all my
>> testing code in there (including make file and a target to compile for
>> gprof) I have been using gprof also and basically same result as on
>> windows.
>> (a lof of activity on mt_alloc kind of calls, though in windows it's
>> called
>> RtlAllocateHeap) So indeed a lot of time spent in new etc..
>>
>> To my best knowledge my only conclusion can be this is a bug.
>> Functionally
>> it works, but in practice not as it is too slow to parse. Can the
>> Spirit
>> developers (Dan?) do something about this and take this up in the next
>> release? Or is it going to be left alone and do I have to resort to a
>> different solution? Or is there still something else that can be done?
>>
>> Albert
>>
>>
>    The biggest gain is to remove the leaf node construction +alnum_p
> from
> the aitomatic tree construction since the end is a single node. via a
> no node
> scanner or a functor_parser for +alnum_p.
>
        if this functor used with access_node_d will convert a parse tree to
an ast tree it is much faster to create a parse tree [via pt_parse] but
have the top rule modify the tree into an ast tree using swap and not
assignment.
        The double swap effects the assignment and leaves an empty
vector do destruct. This avoids massive copying as vector::swap
swaps the inard pointers, and integral variables.

   Here is the functor for access_node_d , the static function
do_it does the work recursively.

struct optimize_tree
{
        template <class Node,class Iter>
        void operator () (Node &node,Iter,Iter) const
        {
                do_it(node);
        }

        template <class Node>
        static void do_it(Node &node)
        {
                Node tmp;
                switch(node.children.size())
                {
                case 0:
                        break;
                case 1:
                        tmp.swap(node.children.front());
                        node.swap(tmp);
                        // fall thru intentional
                default:
                        std::for_each
                                (
                                        node.children.begin(),
                                        node.children.end(),
                                        optimize_tree::do_it<Node>
                                );
                        break;
                }
        }
};

                               
                       



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Carl Barron

On Sep 16, 2005, at 1:58 AM, Carl Barron wrote:

> case 1:
> tmp.swap(node.children.front());
> node.swap(tmp);
> // fall thru intentional
> default:
>
        opps

        case 1:
                        tmp.swap(node.children.front());
                        node.swap(tmp);
                        do_it(node);
                        break;
                default:



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
Reply | Threaded
Open this post in threaded view
|

Re: Re: Slow parsing times

Albert Wessel
In reply to this post by Carl Barron

> ----- Original Message -----
> From: "Carl Barron"
> Sent: Friday, September 16, 2005 7:58 AM
> >    The biggest gain is to remove the leaf node construction +alnum_p
> > from
> > the aitomatic tree construction since the end is a single node. via a
> > no node
> > scanner or a functor_parser for +alnum_p.
> >
> if this functor used with access_node_d will convert a parse tree to
> an ast tree it is much faster to create a parse tree [via pt_parse] but
> have the top rule modify the tree into an ast tree using swap and not
> assignment.
> The double swap effects the assignment and leaves an empty
> vector do destruct. This avoids massive copying as vector::swap
> swaps the inard pointers, and integral variables.
>
>    Here is the functor for access_node_d , the static function
> do_it does the work recursively.
>
> struct optimize_tree
> {
> template <class Node,class Iter>
> void operator () (Node &node,Iter,Iter) const
> {
> do_it(node);
> }
>
> template <class Node>
> static void do_it(Node &node)
> {
> Node tmp;
> switch(node.children.size())
> {
> case 0:
> break;
> case 1:
> tmp.swap(node.children.front());
> node.swap(tmp);
> // fall thru intentional
> default:
> std::for_each
> (
> node.children.begin(),
> node.children.end(),
> optimize_tree::do_it<Node>
> );
> break;
> }
> }
> };
>
You mean, I have to use it like this in my grammar ?
access_node_d[+(alnum_p)][optimze_tree()]
Could be that I totally miss the point here, but that doesn't do anything
for the parsing time, it's still slow (I did include the change of your
second posting about this). Do you have some sample code where this works?

Albert



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Spirit-general mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/spirit-general
12