[Regex]: Can a possible non-matched group be prevented from being reported?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Regex]: Can a possible non-matched group be prevented from being reported?

Boost - Users mailing list
Hopefully I'm not posting this again.

Hello, I'm wrapping boost::regex with cython to be able to call it from python.

Assuming a "Hello World" text and a regex (\w+)*.
This results in two matches.

groups ('Hello', 'Hello')
  lastindex 1
  group:0 Hello
    start 0
    end 5
  group:1 Hello
    start 0
    end 5
 
groups ('', '')
  lastindex 1
  group:0
    start 5
    end 5
  group:1
    start -1
    end -1

I understand, that there is always the main match in group0 and matches from
capturing groups in group1 ...

As we see, the second match, the zero-length-width match, reports two groups but the second group returns -1.
Is it possible to prevent such matches to be reported beforehand?
Or is it needed to iterate over the groups to eliminate those?

Just in case it is needed to see my code logic, this is what I'm doing currently

def unicode_research_iter(const wchar_t* text, wchar_t* pattern, int flags):
    cdef:
        wcmatch what
        size_t _length
        size_t _position
        wcregex_iterator start, end
    try:
        start = make_regex_iterator(text, <wregex>pattern, match_flags.match_perl)
        end = wcregex_iterator()

        while (start != end):
            what = <wcmatch>deref(start)
            if not what.empty():
                match_object = UnicodeMatch.from_instance(what)
                yield match_object
            else:
                print('Empty match result: ??')
            inc(start)  # increment
    except Exception as e:
        raise RuntimeError(f'{e}')


Thank you
Eren

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users