A problem with '^' and '$' in boost regex

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

A problem with '^' and '$' in boost regex

Alan Huang-2
Hi,
 
In boost document I saw that the '^' matches all the blank chars in the begin of line and the '$' matches all the blank chars in the end of line. Just like regex( "^abc$" ) can't match the string "   abc   ". What's wrong?

--
Yours Sincerely,
Alan Huang

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: A problem with '^' and '$' in boost regex

Sebastian Redl
Alan Huang wrote:

> Hi,
>  
> In boost document I saw that the '^' matches all the blank chars in
> the begin of line and the '$' matches all the blank chars in the end
> of line. Just like regex( "^abc$" ) can't match the string "   abc  
> ". What's wrong?

You misunderstood. ^ doesn't match any characters, it simply fails if
it's not the start of the line. In other words, it says, "The expression
after me must match at the start of the line, not simply somewhere." $
does the same for the end of the line.
Therefore, "^cde" matches "cdefg", but not "abcde".

If the boost documentation really says that ^ and $ match any blank
chars, that's a bug and should be corrected.

Sebastian Redl
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: A problem with '^' and '$' in boost regex

Alan Huang-2


On 1/23/06, Sebastian Redl <[hidden email]> wrote:
Alan Huang wrote:

> Hi,
>
> In boost document I saw that the '^' matches all the blank chars in
> the begin of line and the '$' matches all the blank chars in the end
> of line. Just like regex( "^abc$" ) can't match the string "   abc
> ". What's wrong?

You misunderstood. ^ doesn't match any characters, it simply fails if
it's not the start of the line. In other words, it says, "The expression
after me must match at the start of the line, not simply somewhere." $
does the same for the end of the line.
Therefore, "^cde" matches "cdefg", but not "abcde".

If the boost documentation really says that ^ and $ match any blank
chars, that's a bug and should be corrected.

Sebastian Redl
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
 
Thank you Redl,
But I made a expirementation and found that "^cde" couldn't neither match "cdefg", nor "abcde",
 

Anchors:

A '^' character shall match the start of a line.

A '$' character shall match the end of a line.

I still can't understand it exactly, if "^cde$" only matches "cde", why we need to write the regex as "^ced$"?

--
Yours Sincerely,
Alan Huang

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: A problem with '^' and '$' in boost regex

John Maddock
> But I made a expirementation and found that "^cde" couldn't neither
> match "cdefg", nor "abcde",

^cde will *find* a match in cdefg if you call regex_search, it will not
succeed if you call regex_match however as that requires *all* of the string
to be matched. "abcde" will never be matched by ^cde.

>  Anchors:
>
> A '^' character shall match the start of a line.
>
> A '$' character shall match the end of a line.
> I still can't understand it exactly, if "^cde$" only matches "cde",
> why we need to write the regex as "^ced$"?

What?  ^cde$ will match the characters "cde" *only* if:

1) The "c" was preceeded by a line break character, or the "c" was the first
character in the string.

*and*

2) The "e" character is followed by a line break character or is the last
character in the string.

So using regex_search the following strings contain matches for ^cde$ :

cde
\ncde
\ncde\n
cde\n

However, using regex_match only the first of those strings would be matched.

John.

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users