RegEx Exception

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RegEx Exception

Michael Primeaux
I'm using the latest version of the boost regex libraries and am receiving an exception when using the regex_replace function with the following regular expression:
 
(?i)(?<!')((<\s*OBJECT\s*[^>]+classid="clsid:[^>]+?(?<!\s*/\s*)>[^*]+?<\s*/\s*OBJECT\s*>)|(<\s*OBJECT\s*[^>]+classid=\s*\"clsid:[^/>]+/))
 
...which works in other industry standard regex library routines.
 
I seem to be failing on line 767 of basic_regex_creator.hpp.
 
Any assistance is appreciated.
 
Kindest regards,
Michael Primeaux

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: RegEx Exception

Eric Niebler
Michael Primeaux wrote:

> I'm using the latest version of the boost regex libraries and am
> receiving an exception when using the regex_replace function with the
> following regular expression:
>  
> (?i)(?<!')((<\s*OBJECT\s*[^>]+classid="clsid:[^>]+?(?<!\s*/\s*)>[^*]+?<\s*/\s*OBJECT\s*>)|(<\s*OBJECT\s*[^>]+classid=\s*\"clsid:[^/>]+/))
>  
> ...which works in other industry standard regex library routines.
>  
> I seem to be failing on line 767 of basic_regex_creator.hpp.
>  
> Any assistance is appreciated.
>  


If I had to guess, I'd say the problem is "(?<!\\s*/\\s*)". This is a
negative look-behind assertion. For most regex engines out there,
look-behinds only work for fixed-width sub-expressions, like (?<!foo),
which asserts that the previous 3 characters are not "foo". Your's is a
variable-width look-behind, and I'm pretty sure Boost.Regex can't handle
that.

Which other "industry standard" regex library handles this? Perl doesn't
allow variable-width look-behinds. The only one I'm aware of which
allows this is GRETA (which I wrote many moons ago).

--
Eric Niebler
Boost Consulting
www.boost-consulting.com
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: RegEx Exception

John Maddock
In reply to this post by Michael Primeaux
>> I'm using the latest version of the boost regex libraries and am
>> receiving an exception when using the regex_replace function with
>> the following regular expression:
>>
>> (?i)(?<!')((<\s*OBJECT\s*[^>]+classid="clsid:[^>]+?(?<!\s*/\s*)>[^*]+?<\s*/\
>> s*OBJECT\s*>)|(<\s*OBJECT\s*[^>]+classid=\s*\"clsid:[^/>]+/))
>>
>> ..which works in other industry standard regex library routines.
>>
>> I seem to be failing on line 767 of basic_regex_creator.hpp.

That happens when you have a lookbehind expression that's not supported: as
Eric has already pointed out Boost.Regex doesn't support variable width
lookbehind, neither do PCRE or or Perl as far as I know.  It's technically
possible to do, although doing so efficiently is quite hard, and for
questionable gain at present.

HTH, John.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: RegEx Exception

Michael Primeaux
In reply to this post by Eric Niebler
As you suspected, the variable-width negative look-behind was the issue.
The Microsoft .NET Framework 2.0 supports variable width negative
look-behind. Perhaps it's based on your work with GRETA?

Again, I appreciate your time.

Kindest regards,
Michael Primeaux
www.i-dynamics-corporation.com


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Eric Niebler
Sent: Tuesday, March 28, 2006 12:49 AM
To: [hidden email]
Subject: Re: [Boost-users] RegEx Exception

Michael Primeaux wrote:

> I'm using the latest version of the boost regex libraries and am
> receiving an exception when using the regex_replace function with the
> following regular expression:
>  
> (?i)(?<!')((<\s*OBJECT\s*[^>]+classid="clsid:[^>]+?(?<!\s*/\s*)>[^*]+?
> <\s*/\s*OBJECT\s*>)|(<\s*OBJECT\s*[^>]+classid=\s*\"clsid:[^/>]+/))
>  
> ...which works in other industry standard regex library routines.
>  
> I seem to be failing on line 767 of basic_regex_creator.hpp.
>  
> Any assistance is appreciated.
>  


If I had to guess, I'd say the problem is "(?<!\\s*/\\s*)". This is a
negative look-behind assertion. For most regex engines out there,
look-behinds only work for fixed-width sub-expressions, like (?<!foo), which
asserts that the previous 3 characters are not "foo". Your's is a
variable-width look-behind, and I'm pretty sure Boost.Regex can't handle
that.

Which other "industry standard" regex library handles this? Perl doesn't
allow variable-width look-behinds. The only one I'm aware of which allows
this is GRETA (which I wrote many moons ago).

--
Eric Niebler
Boost Consulting
www.boost-consulting.com
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users