String split behaviour

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

String split behaviour

Venkateswara Rao Sanaka
Hi,

I am getting two empty strings from the following program,

void boost_split_test() {
    const string &text("-");
    vector<string> tokens;
    split(tokens, text, boost::is_any_of("-"), token_compress_on);

    cout << "size of tokens " << tokens.size() << '\n';

    for (auto const &e : tokens)
        cout << e.size() << '\n';
}

Output:

size of tokens 2
0
0

Is this expected output? I expecting an zero split parts. Could someone clarify?

--
Thanks,
:) Venki.

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: String split behaviour

Marshall Clow-2
On Wed, May 20, 2015 at 11:32 AM, Venkateswara Rao Sanaka <[hidden email]> wrote:
Hi,

I am getting two empty strings from the following program,

void boost_split_test() {
    const string &text("-");
    vector<string> tokens;
    split(tokens, text, boost::is_any_of("-"), token_compress_on);

    cout << "size of tokens " << tokens.size() << '\n';

    for (auto const &e : tokens)
        cout << e.size() << '\n';
}

Output:

size of tokens 2
0
0

Is this expected output? I expecting an zero split parts. Could someone clarify?


This seems reasonable to me.

You asked it to split the string containing a single dash into parts separated by dashes.
The string gets split into an empty string, a dash (which is not returned to you, being the separator), and an empty string.

Consider splitting the input string "Foo-" (or "-Foo") compared to "Foo".
One gives two strings (one before the dash, one after the dash), the other gives one string (because there are no dashes).

Given a string with "n" separators, you should get "n+1" strings back (with the proviso that consecutive separators are collapsed together, so "Foo--" is treated the same as "Foo-").

-- Marshall

P.S. Checking the tests, I notice that there's no coverage for this case (separators at the beginning or the end of the input). I'll put it on my list. Thanks!


_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: String split behaviour

Venkateswara Rao Sanaka
Thanks Marshall for the reply.

In our code I faced a strange error when splitting the string. The hyphen symbol was used to represent null data, upon splitting the string containing only hyphen, I expected a result of zero tokens (I was wrong here). Even dynamic languages are behaving same, see below a python sample,

Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "-Foo"
>>> tokens = s.split("-")
>>> print tokens
['', 'Foo']
>>> 

An example in the boost documentation would help the user.

Even the following command line example proves the same,

$echo "a-b" | awk -F "-" '{for (i=1; i <= NR; i++) printf "%s:", $i}' ---> This will print a:b
$echo "-" | awk -F "-" '{for (i=1; i <= NR; i++) printf "%s:", $i}'   ---> This will print : (i.e. two NUL strings on screen)

Infact the second command line example was the reason behind my confusion :)

Thankful to you all Boost developers. Great work.

On Thu, May 21, 2015 at 1:59 AM, Marshall Clow <[hidden email]> wrote:
On Wed, May 20, 2015 at 11:32 AM, Venkateswara Rao Sanaka <[hidden email]> wrote:
Hi,

I am getting two empty strings from the following program,

void boost_split_test() {
    const string &text("-");
    vector<string> tokens;
    split(tokens, text, boost::is_any_of("-"), token_compress_on);

    cout << "size of tokens " << tokens.size() << '\n';

    for (auto const &e : tokens)
        cout << e.size() << '\n';
}

Output:

size of tokens 2
0
0

Is this expected output? I expecting an zero split parts. Could someone clarify?


This seems reasonable to me.

You asked it to split the string containing a single dash into parts separated by dashes.
The string gets split into an empty string, a dash (which is not returned to you, being the separator), and an empty string.

Consider splitting the input string "Foo-" (or "-Foo") compared to "Foo".
One gives two strings (one before the dash, one after the dash), the other gives one string (because there are no dashes).

Given a string with "n" separators, you should get "n+1" strings back (with the proviso that consecutive separators are collapsed together, so "Foo--" is treated the same as "Foo-").

-- Marshall

P.S. Checking the tests, I notice that there's no coverage for this case (separators at the beginning or the end of the input). I'll put it on my list. Thanks!


_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing



--
Thanks,
:) Venki.

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing