[regex] Boost.Regex + ICU vs. standalone ICU

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[regex] Boost.Regex + ICU vs. standalone ICU

AJG-2
Hi there,

Is there a document out there describing any substantial differences
(in particular, w.r.t. semantics and correctness) between using
Boost.Regex with ICU support baked in as opposed to ICU's built-in
RegexMatcher/RegexPattern classes?

I do realize, for example, that the former's API has a 'modern C++'
style, whereas the latter is modeled on Java's -- but these differences
seem to be mostly cosmetic. Is there anything else I should be aware of
before deciding to use one or the other -- performance, functionality,
ease-of-use, etc.?

Thanks!

PS: It doesn't look like xpressive has Unicode support yet, which is
why I'm not considering it, but I'd love to know if this impression
is false or soon-to-be false.




_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [regex] Boost.Regex + ICU vs. standalone ICU

John Maddock-3
> Is there a document out there describing any substantial differences
> (in particular, w.r.t. semantics and correctness) between using
> Boost.Regex with ICU support baked in as opposed to ICU's built-in
> RegexMatcher/RegexPattern classes?
>
> I do realize, for example, that the former's API has a 'modern C++'
> style, whereas the latter is modeled on Java's -- but these differences
> seem to be mostly cosmetic. Is there anything else I should be aware of
> before deciding to use one or the other -- performance, functionality,
> ease-of-use, etc.?

To a large extend all regex engines are created remarkably equal.

Looks like ICU hasn't caught up with some of Perl-5.10's additions yet
(recursive expressions for example).

The other advantage of Boost.Regex is that being iterator based you can
search text in non-contiguous storage.

Those were the only two that jumped out at me from a quick look at ICU.

HTH, John.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [regex] Boost.Regex + ICU vs. standalone ICU

AJG-2
John Maddock <boost.regex <at> virgin.net> writes:

> To a large extend all regex engines are created remarkably equal.
>
> Looks like ICU hasn't caught up with some of Perl-5.10's additions yet
> (recursive expressions for example).
>
> The other advantage of Boost.Regex is that being iterator based you can
> search text in non-contiguous storage.
>
> Those were the only two that jumped out at me from a quick look at ICU.
>
> HTH, John.

Thanks, it does :). Indeed, it doesn't sound like there's a ton of difference.

I actually didn't know that Boost.Regex supported recursive regexen. Would you
mind pointing me to the documentation for it, and/or what the syntax looks like?

Thanks again.





_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [regex] Boost.Regex + ICU vs. standalone ICU

John Maddock-3
> I actually didn't know that Boost.Regex supported recursive regexen. Would
> you
> mind pointing me to the documentation for it, and/or what the syntax looks
> like?

The new stuff in Perl-5.10 and supported in Boost.Regex are:

Named sub-expressions:
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.named_subexpressions

Branch resets:
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.branch_reset

Recursion:
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions

Conditional on recursion or subexpression match:
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.conditional_expressions

The (*OPERATOR) syntax introduced in Perl-5.10 is not currently supported.

HTH, John.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [regex] Boost.Regex + ICU vs. standalone ICU

AJG-2
John Maddock <boost.regex <at> virgin.net> writes:
> The new stuff in Perl-5.10 and supported in Boost.Regex are:
>
> Named sub-expressions:
>
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl
_syntax.html#boost_regex.syntax.perl_syntax.named_subexpressions
>
> Branch resets:
>
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl
_syntax.html#boost_regex.syntax.perl_syntax.branch_reset
>
> Recursion:
>
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl
_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions
>
> Conditional on recursion or subexpression match:
>
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/syntax/perl
_syntax.html#boost_regex.syntax.perl_syntax.conditional_expressions
>
> The (*OPERATOR) syntax introduced in Perl-5.10 is not currently supported.
>
> HTH, John.

That's fantastic, thank you. I have an unrelated Boost.Regex question: how
expensive are basic_regex objects to copy? Say, relative to constructing (and
thus reparsing) from a string pattern, anew?

The reason I ask is because I have a variant type with several value types,
including a regex type. I was wondering whether it makes more sense to store the
regex pattern in that variant as a string, and to construct the actual
basic_regex from that string every time I need to operate on it, or to store it
as a basic_regex and hope that copies are not too expensive. One other
alternative is to store the regex on the heap using a smart pointer, but I'd
rather keep value semantics so long as performance isn't an issue.

(Oh, incidentally, does Boost.Regex have support for Boost.Serialization? Right
now what I do is serialize it [the pattern] as an std::string -- is that the
recommended approach?)


Thanks!





_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [regex] Boost.Regex + ICU vs. standalone ICU

John Maddock-3
> That's fantastic, thank you. I have an unrelated Boost.Regex question: how
> expensive are basic_regex objects to copy? Say, relative to constructing
> (and
> thus reparsing) from a string pattern, anew?

A *lot* more efficient - basic_regex is a pimpl so it's just a shared_ptr
copy to copy a basic_regex.

> The reason I ask is because I have a variant type with several value
> types,
> including a regex type. I was wondering whether it makes more sense to
> store the
> regex pattern in that variant as a string, and to construct the actual
> basic_regex from that string every time I need to operate on it, or to
> store it
> as a basic_regex and hope that copies are not too expensive. One other
> alternative is to store the regex on the heap using a smart pointer, but
> I'd
> rather keep value semantics so long as performance isn't an issue.

No, don't construct every time you need it, that's just wasting CPU cycles
:-(

> (Oh, incidentally, does Boost.Regex have support for Boost.Serialization?
> Right
> now what I do is serialize it [the pattern] as an std::string -- is that
> the
> recommended approach?)

Yes, there's no explicit serialization support.

HTH, John.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users