[xpression] fuzzy on smatch fields

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[xpression] fuzzy on smatch fields

l_d_allan
<alert comment="xpression newbie">

Xpressive looks very promising to be able to do some things I'm trying
to implement. Thanks for providing and supporting it.

I was trying to figure out the fields that make up xpression
smatch'es, and expanded the Example-1 to be more verbose. Basically, I
tried to "unpack" as much info as I could find in the variable "what"
to clarify some fuzziness on my part. There were some questions:

* The suffix and prefix info seemed blank. Are there accessors to get
more info to conform to my (possibly flawed) understanding of the
docs?

* The return from regex_id seemed to be an address (like 00323F58). Is
that intended? Is there some accessor to get something more
meaningful? (but I'm not clear what would be meaningful). What is the
purpose of "regex_id" to the user of xpressive?

* With vc7.1, there were warnings unless I cast the lengths and
positions ... is this intended?

void example1_verbose()
{
std::string hello( "hello world!" );
sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
smatch what;
if( regex_match( hello, what, rex ) )
{
std::cout << "Overall Size: " << static_cast<int>(what.size()) <<
'\n';
std::cout << "Regex Id: " << what.regex_id() << '\n';
std::cout << "what[0]: " << what[0] << '\n';
std::cout << "Length(0): " << static_cast<int>(what.length(0)) <<
'\n';
std::cout << "Position(0): " << static_cast<int>(what.position(0)) <<
'\n';
std::cout << "Str(0): " << what.str(0) << '\n';
std::cout << "what[1]: " << what[1] << '\n';
std::cout << "Length(1): " << static_cast<int>(what.length(1)) <<
'\n';
std::cout << "Position(1): " << static_cast<int>(what.position(1)) <<
'\n';
std::cout << "Str(1): " << what.str(1) << '\n';
std::cout << "what[2]: " << what[2] << '\n';
std::cout << "Length(2): " << static_cast<int>(what.length(2)) <<
'\n';
std::cout << "Position(2): " << static_cast<int>(what.position(2)) <<
'\n';
std::cout << "Str(2): " << what.str(2) << '\n';
std::cout << "Prefix(): " << what.prefix() << '\n';
std::cout << "Prefix().matched: " << what.prefix().matched << '\n';
std::cout << "Prefix().length(): " <<
static_cast<int>(what.prefix().length()) << '\n';
std::cout << "Prefix().str(): " << what.prefix().str() << '\n';
std::cout << "Suffix(): " << what.suffix() << '\n';
std::cout << "Suffix().matched: " << what.suffix().matched << '\n';
std::cout << "Suffix().length(): " <<
static_cast<int>(what.suffix().length()) << '\n';
std::cout << "Suffix().str(): " << what.suffix().str() << '\n';
}
}

Example 1: Verbose:

Overall Size: 3
Regex Id: 00323F58
what[0]: hello world!
Length(0): 12
Position(0): 0
Str(0): hello world!
what[1]: hello
Length(1): 5
Position(1): 0
Str(1): hello
what[2]: world
Length(2): 5
Position(2): 6
Str(2): world
Prefix():
Prefix().matched: 0
Prefix().length(): 0
Prefix().str():
Suffix():
Suffix().matched: 0
Suffix().length(): 0
Suffix().str():

</alert>


_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [xpression] fuzzy on smatch fields

l_d_allan
Oops ... disregard .... redface .... Example 1 is related to "match"
rather than "search" .... so I suppose prefix and suffix would not
apply.

But, in example 2, seems like "year" should be repeat<1,4>, but:
year= repeat<1,2> works:

cregex date = (month= repeat<1,2>(_d)) // find the month ...
>> (delim= (set= '/','-')) // followed by a delimiter ...
>> (day= repeat<1,2>(_d)) >> delim // and a day followed by the same
>> delimiter ...
>> (year= repeat<1,2>(_d >> _d)); // and the year.

actually, repeat<1,3> works for month, day, and year. Am I mixed up on
what "repeat" means?

cregex date = (month= repeat<1,3>(_d)) // find the month ...
>> (delim= (set= '/','-')) // followed by a delimiter ...
>> (day= repeat<1,3>(_d)) >> delim // and a day followed by the same
>> delimiter ...
>> (year= repeat<1,3>(_d >> _d)); // and the year.



Lynn Allan wrote:

> <alert comment="xpression newbie">
>
> Xpressive looks very promising to be able to do some things I'm
> trying
> to implement. Thanks for providing and supporting it.
>
> I was trying to figure out the fields that make up xpression
> smatch'es, and expanded the Example-1 to be more verbose. Basically,
> I
> tried to "unpack" as much info as I could find in the variable
> "what"
> to clarify some fuzziness on my part. There were some questions:
>
> * The suffix and prefix info seemed blank. Are there accessors to
> get
> more info to conform to my (possibly flawed) understanding of the
> docs?
>
> * The return from regex_id seemed to be an address (like 00323F58).
> Is
> that intended? Is there some accessor to get something more
> meaningful? (but I'm not clear what would be meaningful). What is
> the
> purpose of "regex_id" to the user of xpressive?
>
> * With vc7.1, there were warnings unless I cast the lengths and
> positions ... is this intended?
>
> void example1_verbose()
> {
> std::string hello( "hello world!" );
> sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
> smatch what;
> if( regex_match( hello, what, rex ) )
> {
> std::cout << "Overall Size: " << static_cast<int>(what.size()) <<
> '\n';
> std::cout << "Regex Id: " << what.regex_id() << '\n';
> std::cout << "what[0]: " << what[0] << '\n';
> std::cout << "Length(0): " << static_cast<int>(what.length(0)) <<
> '\n';
> std::cout << "Position(0): " << static_cast<int>(what.position(0))
> <<
> '\n';
> std::cout << "Str(0): " << what.str(0) << '\n';
> std::cout << "what[1]: " << what[1] << '\n';
> std::cout << "Length(1): " << static_cast<int>(what.length(1)) <<
> '\n';
> std::cout << "Position(1): " << static_cast<int>(what.position(1))
> <<
> '\n';
> std::cout << "Str(1): " << what.str(1) << '\n';
> std::cout << "what[2]: " << what[2] << '\n';
> std::cout << "Length(2): " << static_cast<int>(what.length(2)) <<
> '\n';
> std::cout << "Position(2): " << static_cast<int>(what.position(2))
> <<
> '\n';
> std::cout << "Str(2): " << what.str(2) << '\n';
> std::cout << "Prefix(): " << what.prefix() << '\n';
> std::cout << "Prefix().matched: " << what.prefix().matched << '\n';
> std::cout << "Prefix().length(): " <<
> static_cast<int>(what.prefix().length()) << '\n';
> std::cout << "Prefix().str(): " << what.prefix().str() << '\n';
> std::cout << "Suffix(): " << what.suffix() << '\n';
> std::cout << "Suffix().matched: " << what.suffix().matched << '\n';
> std::cout << "Suffix().length(): " <<
> static_cast<int>(what.suffix().length()) << '\n';
> std::cout << "Suffix().str(): " << what.suffix().str() << '\n';
> }
> }
>
> Example 1: Verbose:
>
> Overall Size: 3
> Regex Id: 00323F58
> what[0]: hello world!
> Length(0): 12
> Position(0): 0
> Str(0): hello world!
> what[1]: hello
> Length(1): 5
> Position(1): 0
> Str(1): hello
> what[2]: world
> Length(2): 5
> Position(2): 6
> Str(2): world
> Prefix():
> Prefix().matched: 0
> Prefix().length(): 0
> Prefix().str():
> Suffix():
> Suffix().matched: 0
> Suffix().length(): 0
> Suffix().str():
>
> </alert>


_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [xpression] fuzzy on smatch fields

Eric Niebler
Lynn Allan wrote:

>
> But, in example 2, seems like "year" should be repeat<1,4>, but:
> year= repeat<1,2> works:
>
> cregex date = (month= repeat<1,2>(_d)) // find the month ...
>>> (delim= (set= '/','-')) // followed by a delimiter ...
>>> (day= repeat<1,2>(_d)) >> delim // and a day followed by the same
>>> delimiter ...
>>> (year= repeat<1,2>(_d >> _d)); // and the year.
>
> actually, repeat<1,3> works for month, day, and year. Am I mixed up on
> what "repeat" means?


repeat<n,m>(X) means to match X between n and m times, inclusive. So
matching a month a day, you want repeat<1,2>(_d) to match 1 or 2 digit
characters, and to match a year, you want repeat<1,2>(_d >> _d) to match
two digits or four digits. Three digits isn't a common representation of
a year.

HTH,

--
Eric Niebler
Boost Consulting
www.boost-consulting.com
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [xpression] fuzzy on smatch fields

l_d_allan
Eric Niebler wrote:

> Lynn Allan wrote:
>>
>> But, in example 2, seems like "year" should be repeat<1,4>, but:
>> year= repeat<1,2> works:
>>
>> cregex date = (month= repeat<1,2>(_d)) // find the month ...
>>>> (delim= (set= '/','-')) // followed by a delimiter ...
>>>> (day= repeat<1,2>(_d)) >> delim // and a day followed by the same
>>>> delimiter ...
>>>> (year= repeat<1,2>(_d >> _d)); // and the year.
>>
>> actually, repeat<1,3> works for month, day, and year. Am I mixed up
>> on what "repeat" means?
>
>
> repeat<n,m>(X) means to match X between n and m times, inclusive. So
> matching a month a day, you want repeat<1,2>(_d) to match 1 or 2
> digit
> characters, and to match a year, you want repeat<1,2>(_d >> _d) to
> match two digits or four digits. Three digits isn't a common
> representation of a year.

Ok .... and thanks for your patient assistance.

I think I see why repeat<1,2> works for yyyy, but AFAICT, the
repeat<1,3> "worked" for day dd and month mm, which seems off. I
changed the sample code "just to see what would happen" and was
scratch-my-head-surprised to get the same results from repeat<1,2> as
for repeat<1,3> .... days and months were "captured".

But I'm probably doing something wrong or "just don't get it" about
xpressive.

Here is the "tweaked" Example 2 using repeat<1,3>:
void example2()
{
    char const *str = "I was born on 5/30/1973 at 7am.";

    // define some custom mark_tags with names more meaningful than
s1, s2, etc.
    mark_tag day(1), month(2), year(3), delim(4);

    // this regex finds a date
    cregex date = (month= repeat<1,3>(_d))           // find the month
...
               >> (delim= (set= '/','-'))            // followed by a
delimiter ...
               >> (day=   repeat<1,3>(_d)) >> delim  // and a day
followed by the same delimiter ...
               >> (year=  repeat<1,3>(_d >> _d));    // and the year.

    cmatch what;

    if( regex_search( str, what, date ) )
    {
        std::cout << "LdaExample2"  << '\n'; // whole match
        std::cout << what[0]     << '\n'; // whole match
        std::cout << what[day]   << '\n'; // the day
        std::cout << what[month] << '\n'; // the month
        std::cout << what[year]  << '\n'; // the year
        std::cout << what[delim] << '\n'; // the delimiter
    }
}



_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [xpression] fuzzy on smatch fields

Eric Niebler

Lynn Allan wrote:

> Eric Niebler wrote:
>
>>Lynn Allan wrote:
>>
>>>But, in example 2, seems like "year" should be repeat<1,4>, but:
>>>year= repeat<1,2> works:
>>>
>>>cregex date = (month= repeat<1,2>(_d)) // find the month ...
>>>
>>>>>(delim= (set= '/','-')) // followed by a delimiter ...
>>>>>(day= repeat<1,2>(_d)) >> delim // and a day followed by the same
>>>>>delimiter ...
>>>>>(year= repeat<1,2>(_d >> _d)); // and the year.
>>>
>>>actually, repeat<1,3> works for month, day, and year. Am I mixed up
>>>on what "repeat" means?
>>
>>
>>repeat<n,m>(X) means to match X between n and m times, inclusive. So
>>matching a month a day, you want repeat<1,2>(_d) to match 1 or 2
>>digit
>>characters, and to match a year, you want repeat<1,2>(_d >> _d) to
>>match two digits or four digits. Three digits isn't a common
>>representation of a year.
>
>
> Ok .... and thanks for your patient assistance.
>
> I think I see why repeat<1,2> works for yyyy, but AFAICT, the
> repeat<1,3> "worked" for day dd and month mm, which seems off. I
> changed the sample code "just to see what would happen" and was
> scratch-my-head-surprised to get the same results from repeat<1,2> as
> for repeat<1,3> .... days and months were "captured".


repeat<1,3>(_d) will match one digit, or two digits, or three digits.
So, yes, it will match days (which are one or two digits) or months
(which are one or two digits). However, it is overly permissive, because
it will also match three digits, which is not a valid day or month.

--
Eric Niebler
Boost Consulting
www.boost-consulting.com
_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [xpression] fuzzy on smatch fields

l_d_allan
> repeat<1,3>(_d) will match one digit, or two digits, or three
> digits.
> So, yes, it will match days (which are one or two digits) or months
> (which are one or two digits). However, it is overly permissive,
> because it will also match three digits, which is not a valid day or
> month.

Sorry ... VERY red-face on this regex newbie. You are very gracious.


_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users