Quantcast

[nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis
Hello all Booster,

I comments on a library that I want to submit for a formal review.

The library provides an implementation of standard C and C++ library
functions such that their inputs are UTF-8 aware on Windows without
requiring using Wide API to make program work on Windows.

Library:    Boost.Nowide
Download:    http://cppcms.com/files/nowide/nowide.zip
Documents:   http://cppcms.com/files/nowide/html/
Features:    http://cppcms.com/files/nowide/html/index.html#main_the_solution


Tested On:

OS: Windows 7 32/64 bit, Linux
Compilers: GCC-4.6, MSVC-10


 
Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Yakov Galka
On Mon, May 28, 2012 at 3:33 PM, Artyom Beilis <[hidden email]> wrote:

> [...]
> The library provides an implementation of standard C and C++ library
> functions such that their inputs are UTF-8 aware on Windows without
> requiring using Wide API to make program work on Windows.
>

Hi,

I'm happy that this is getting to be proposed to boost.

My comments:

* I find the way you handle the main() arguments elegant.

* I don't like that the convert function is overloaded for both narrow and
wide conversions.
Rationale: Consider the following real-world scenario:

    // Some existing overloaded function, like std::fstream constructor on
dinkumware
    void f(const std::string &s); // 3rd party 'ANSI' codepage
    void f(const std::wstring &s); // 3rd party 'UNICODE'

    std::string str = get_utf8_string();
    f(convert(str)); // we want to call the wide string version

Now during development we may change it to:

    std::wstring str = get_string_from_windows(); // we changed only this
line
    f(convert(str)); // and forgot to change this one. oops...

Solution: This is an error that can be caught at compile time, we just have
to state the intent clearly. Use alternative names? (narrow/widen)


Cheers,
--
Yakov

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis


>________________________________
> From: Yakov Galka <[hidden email]>
>On Mon, May 28, 2012 at 3:33 PM, Artyom Beilis <[hidden email]> wrote:
>
>> [...]
>> The library provides an implementation of standard C and C++ library
>> functions such that their inputs are UTF-8 aware on Windows without
>> requiring using Wide API to make program work on Windows.
>>
>
>Hi,
>
>I'm happy that this is getting to be proposed to boost.
>

Also note, it is different from the old version of my nowide library
I published once: added argv, argc, env and cin/cout/cerr/log such that you
can actually write and read Unicode characters to/from console...

>My comments:
>
>* I find the way you handle the main() arguments elegant.
>
>* I don't like that the convert function is overloaded for both narrow and
>wide conversions.
>Rationale: Consider the following real-world scenario:
>
>    // Some existing overloaded function, like std::fstream constructor on
>dinkumware
>    void f(const std::string &s); // 3rd party 'ANSI' codepage
>    void f(const std::wstring &s); // 3rd party 'UNICODE'
>
>    std::string str = get_utf8_string();
>    f(convert(str)); // we want to call the wide string version
>
>Now during development we may change it to:
>
>    std::wstring str = get_string_from_windows(); // we changed only this
>line
>    f(convert(str)); // and forgot to change this one. oops...
>
>Solution: This is an error that can be caught at compile time, we just have
>to state the intent clearly. Use alternative names? (narrow/widen)

Very good point. I'll change them to widen/narrower


>
>
>Cheers,
>--
>Yakov
>

 
Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis
In reply to this post by Artyom Beilis
Hello,

To make the purpose of Boost.Nowide more clear I'll add an example from the docs.


Let's write a simple program that confirms the C++2011 or C++2003 standard
that counts a number of lines in the file:


    #include <fstream>
    #include <iostream>

    int main(int argc,char **argv)
    {
        if(argc!=2) {
            std::cerr << "Usage: file_name" << std::endl;
            return 1;
        }

        std::ifstream f(argv[1]);
        if(!f) {
            std::cerr << "Can't open a file " << argv[1] << std::endl;
            return 1;
        }
        int total_lines = 0;
        while(f) {
            if(f.get() == '\n')
                total_lines++;
        }
        f.close();
        std::cout << "File " << argv[1] << " has " << total_lines << " lines" << std::endl;
        return 0;
    }

Any Bugs?

This trivial program would not work on Windows if the file name is Unicode
file name, argv - does not hold Unicode string, std::ifstream can't open Uicode file name and
std::cout can't print Unicode characters to the console...

Boost.Nowide provides an alternative for common standard library function and suggest a general
pattern to handle Unicode strings in the cross platform program:

    #include <boost/nowide/fstream.hpp>
    #include <boost/nowide/iostream.hpp>
    #include <boost/nowide/args.hpp>

    int main(int argc,char **argv)
    {
        //
        // Fix arguments - argv holds Unicode string (UTF-8)
        //
        boost::nowide::args a(argc,argv);
        if(argc!=2) {
            boost::nowide::cerr << "Usage: file_name" << std::endl;
            return 1;
        }

        //
        // Fix fstream it can open a file using Unicode file name (UTF-8)
        //
        boost::nowide::ifstream f(argv[1]);
        if(!f) {
            //
            // cerr can print Unicode characters to console regardless console code page
            //
            boost::nowide::cerr << "Can't open a file " << argv[1] << std::endl;
            return 1;
        }
        int total_lines = 0;
        while(f) {
            if(f.get() == '\n')
                total_lines++;
        }
        f.close();
        //
        // cout can print Unicode characters to console regardless console code page
        //
        boost::nowide::cout << "File " << argv[1] << " has " << total_lines << " lines" << std::endl;
        return 0;
    }

This is the general approach, it also provides glue conversion functions to handle Unicode on API boundary
level where needed


   #ifdef _WIN32
   bool copy_file(std::string const &src,std::string const &tgt)
   {
       return CopyFileW(boost::nowide::convert(src).c_str(),
                        boost::nowide::convert(tgt).c_str(),
                        TRUE);
   }
   #else
   bool copy_file(std::string const &src,std::string const &tgt)
   {
      // POSIX implementation
   }
   #endif


Waiting for Comments:

Download:    http://cppcms.com/files/nowide/nowide.zip
Documents:   http://cppcms.com/files/nowide/html/


Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/



> Hello all Booster,
>
> I wan to get comments on a library that I want to submit for a formal review.
>
> The library provides an implementation of standard C and C++ library
> functions such that their inputs are UTF-8 aware on Windows without
> requiring using Wide API to make program work on Windows.
>
> Library:    Boost.Nowide
> Download:    http://cppcms.com/files/nowide/nowide.zip
> Documents:   http://cppcms.com/files/nowide/html/
> Features:    http://cppcms.com/files/nowide/html/index.html#main_the_solution
>
>
> Tested On:
>
> OS: Windows 7 32/64 bit, Linux
> Compilers: GCC-4.6, MSVC-10
>
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Klaim - Joël Lamotte
Hi, I just read the documentation, so far this library looks nice.

On Tue, May 29, 2012 at 11:25 PM, Artyom Beilis <[hidden email]> wrote:

> boost::nowide::args a(argc,argv);


args is an object maintaining the lifetime of the new values that argv will
point to.

Is my understanding correct?


Joel Lamotte

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis


----- Original Message -----

> From: Klaim - Joël Lamotte <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Tuesday, May 29, 2012 5:38 PM
> Subject: Re: [boost] [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review
>
> Hi, I just read the documentation, so far this library looks nice.
>
> On Tue, May 29, 2012 at 11:25 PM, Artyom Beilis <[hidden email]>
> wrote:
>
>>  boost::nowide::args a(argc,argv);
>
>
> args is an object maintaining the lifetime of the new values that argv will
> point to.
>
> Is my understanding correct?
>
>
> Joel Lamotte
>
>


Yes, you understand correctly.

So main function


  int main(int argc,char **argv[,char **env])
  {
     ...

  }


Simply changed to


  int main(int argc,char **argv[,char **env])
  {
     boost::nowide::args a(argc,argv[,env])

     ...

  }

Where a is args instance that holds the "replaced" values.



Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Klaim - Joël Lamotte
On Tue, May 29, 2012 at 11:46 PM, Artyom Beilis <[hidden email]> wrote:


>   int main(int argc,char **argv[,char **env])
>   {

     boost::nowide::args a(argc,argv[,env])
>
>      ...
>
>   }
>
>
I know this is somewhat stupid but isn't it easy to get in this case?

int main(int argc,char **argv[,char **env])
{
      {
          boost::nowide::args a(argc,argv[,env])

       }
       //...
       std::cout << argv[0] ; // crash?

}

Or do you restore argv in the args destructor?


Joel Lamotte

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

alex_perry
>
> Hi, I just read the documentation, so far this library looks nice.
>
+1

>So main function
>
>
>  int main(int argc,char **argv[,char **env])
>  {
>     ...
>
>  }
>
>
>Simply changed to
>
>
>  int main(int argc,char **argv[,char **env])
>  {
>     boost::nowide::args a(argc,argv[,env])
>
>     ...
>
>  }

Having done things similar to this before (if never with any elegance).  I think that the above won't work in all situations.

The problem is that the CRT may have already applied some conversion to argv strings based on your current code page if your application was called with utf16le strings (this is rarely the case if typed at a command prompt but certainly can happen if called via a shortcut or via a user clicking of a file with an associated file type)

The solution is to use wmain for windows rather than main and do any conversion to utf8 there not in main eg finding some (rather hacked I admit) example I've done.

#ifdef WIN32
//Use wmain under windows to get unicode strings which we convert to utf-8 - standard conversion is to map
//to local code page rather than to utf-8
int wmain(int argc, wchar_t* wargv[])
{
    //convert wargv to utf-8 strings
    char ** argv = new char *[argc];
    for ( int i = 0; i< argc; ++i )
    {
        utf8string temp( wargv[i] ); //cvt to utf8

        argv[i] = new char[ temp.getBufferSize() ];
        memcpy( argv[i], temp.c_str(), temp.getBufferSize() );
    }
#else
int main(int argc, char* argv[])
{
#endif


Where utf8string is some class I've used doing efficient utf8,utf16,UCS32 conversions which behaves like a std:string with a few bells and whistles on.

It might be worth while adding this wmain workaround information into your library and providing a boost::nowide::args constructor which takes wchar_t

Hope this is of some use.

Alex

ps apologies if formatting is odd - nabble crashed on me trying to reply so ended up using Outlook and modifying reply manually from message digest which is never great  ....

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Yakov Galka
On Tue, May 29, 2012 at 7:19 PM, Alex Perry <[hidden email]>wrote:

> Having done things similar to this before (if never with any elegance).  I
> think that the above won't work in all situations.


You guessed the implementation incorrectly. Please see the sources. I've
always done it the way you said (with wmain/main), but this is why I said
that Artyom's solution is more elegant.

args doesn't even read the argc/argv arguments. It uses CommandLineToArgvW
and GetEnvironmentStringsW to get the wide strings from Windows, then
converts them to UTF-8 and assigns the pointer to it back to the local
parameters of main.

--
Yakov

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis
In reply to this post by Klaim - Joël Lamotte
> I know this is somewhat stupid but isn't it easy to get in this case?

>
> int main(int argc,char **argv[,char **env])
> {
>       {
>           boost::nowide::args a(argc,argv[,env])
>
>        }
>        //...
>        std::cout << argv[0] ; // crash?
>
> }
>
> Or do you restore argv in the args destructor?
>
>
> Joel Lamotte
>

Actually good point and good idea to restore old argc/argv
parameters.

I'll add this!


 Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

alex_perry
In reply to this post by Yakov Galka
Yakov Galka wrote
You guessed the implementation incorrectly. Please see the sources. I've
always done it the way you said (with wmain/main), but this is why I said
that Artyom's solution is more elegant.
Doh! - should have looked before posting - very neat!

But maybe a note/explanation in the documentation? - If for nothing else just so those who think they are cleverer than they are (like me) could use this library without having to browse the source ?

Alex
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Klaim - Joël Lamotte
In reply to this post by Artyom Beilis
On Wed, May 30, 2012 at 3:55 AM, Artyom Beilis <[hidden email]> wrote:

> Actually good point and good idea to restore old argc/argv
> parameters.
>
> I'll add this!
>


Ah yes I only figured now that you didn't have any constructor...
Happy to help.

Joel Lamotte

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Matus Chochlik
In reply to this post by Artyom Beilis
Hi Artyom,

On Mon, May 28, 2012 at 2:33 PM, Artyom Beilis <[hidden email]> wrote:
>
> I comments on a library that I want to submit for a formal review.
>
> The library provides an implementation of standard C and C++ library
> functions such that their inputs are UTF-8 aware on Windows without
> requiring using Wide API to make program work on Windows.
>

here are my 0.02 Euro:

I completely agree that for general-purpose text storage and handling
(reading lines from text-file/console, reading user input from
GUI, displaying formatted (and localized) messages to the user
in a UI, etc., etc.) UTF-8 should *finally* be adopted.
The other encodings (including UCS-2, UTF-16/32) have their
uses, but should be treated as special cases.

The nowide library is certainly useful within the (limited) scope of working
with text obtained from the OS and passed to the OS where you
can make some assumptions and guess the encoding that the
OS uses and do the conversions from and to UTF8, BUT ...

many text-handling applications tend also use third-party libraries
which also have their own ideas about text encodings and your library
would be *much* more useful if it allowed to "talk" to such libraries
(or devices).

So let me reiterate some points I already mentioned in the earlier
text-related discussions here:

1) Let's use std::string as a encoding-agnostic string as it has
always been - the encoding of the data stored in string should
be application dependent.

2) Let's implement a text storage class (and let's call it) text;
This class would store text (internally in whatever encoding
is the "best" at the specific platform) and would have the following
function defined:

/* UTF-8 encoded */ std::sting str(text t);
- This function would return a std::string containing the text
stored in t encoded in UTF-8.

template <typename SymbolicEncodingTag>
text text::from(std::basic_string<SymbolicEncodingTag::CharT> s)
- This function would convert the string stored in s to text
assuming that s is encoded in encoding specified
by SymbolicEncodingTag.

template <typename SymbolicEncodingTag>
std::basic_string<SymbolicEncodingTag::CharT> text::to(text t);
- This function would convert the text stored in t to
a std::string encoded in encoding specified
by SymbolicEncodingTag.

The encoding tags would specify both concrete encodings
like UTF-16 or ISO-8859-2, etc. and symbolic encodings
like OS (which would autodetect the OS's encoding) or
libFoo which would use libFoo's encoding.

Actually the library would not have to specify many
tags for concrete third-party libraries (maybe only the most
popular). Instead it would provide some means to define
the tags to applications based on their needs.

The text class would be used to store text in class members,
functions parameters, variables, etc. and would be converted
to string (in whatever encoding) only when the contents of the
text has to be examined byte-by-byte, CP-by-CP, etc. or
passed to the OS, library or device requiring a specific encoding.

Also initialization of text from c-string-literals should be handled
correctly on various platforms/compilers.

If I'm not terribly mistaken all the code for conversions between
encodings already is part of Boost.Locale.

Then all the useful things like the nowide::args class and
the wrappers around iostreams, etc. could be implemented
on top of that.

Best,

Matus

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis


----- Original Message -----

> From: Matus Chochlik <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Wednesday, May 30, 2012 10:20 AM
> Subject: Re: [boost] [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review
>
> Hi Artyom,
>
> On Mon, May 28, 2012 at 2:33 PM, Artyom Beilis <[hidden email]>
> wrote:
>>
>>  I comments on a library that I want to submit for a formal review.
>>
>>  The library provides an implementation of standard C and C++ library
>>  functions such that their inputs are UTF-8 aware on Windows without
>>  requiring using Wide API to make program work on Windows.
>>
>
> here are my 0.02 Euro:
>
> I completely agree that for general-purpose text storage and handling
> (reading lines from text-file/console, reading user input from
> GUI, displaying formatted (and localized) messages to the user
> in a UI, etc., etc.) UTF-8 should *finally* be adopted.
> The other encodings (including UCS-2, UTF-16/32) have their
> uses, but should be treated as special cases.
>
> The nowide library is certainly useful within the (limited) scope of working
> with text obtained from the OS and passed to the OS where you
> can make some assumptions and guess the encoding that the
> OS uses and do the conversions from and to UTF8, BUT ...
>
> many text-handling applications tend also use third-party libraries
> which also have their own ideas about text encodings and your library
> would be *much* more useful if it allowed to "talk" to such libraries
> (or devices).
>
> So let me reiterate some points I already mentioned in the earlier
> text-related discussions here:
>
> [snip]
>
> 1) Let's use std::string as a encoding-agnostic string... [snip]
>
> 2) Let's implement a text storage class (and let's call it) text;
> This class would store text ... [snip]
>
> [snip]
> The encoding tags would specify both concrete encodings
> like UTF-16 or ISO-8859-2, etc. and symbolic encodings
> [snip]

I want to stop this direction and discussion before it begins.

This library is not generic library to handle text in all encodings
and handle all possible 3rd part libraries and convert between
them, and this library is not intended to be so.

The potential user of this library do not want to handle 101 encodings
one wants to use ONE and SINGLE encoding all over its application
and convert the strings to Wide encoding on Windows libraries boundaries and
pass the UTF-8 string as is on Unix programs.

Note: The developer that uses this library considers ANSI
      API as broken and only Wide API is a valid API on
      Windows.

So no this library is not Boost.Text it is:

 "I want to use UTF-8 in may application... and I want to use
  only Wide API on Windows as the only correct API to use"


> [snip]
>
> If I'm not terribly mistaken all the code for conversions between
> encodings already is part of Boost.Locale.
>

Yes and Boost.Nowide uses UTF-to-UTF conversion part (that is header only one
in Boost.Locale)

>
> Then all the useful things like the nowide::args class and
> the wrappers around iostreams, etc. could be implemented
> on top of that.
>

The library does not reinvent the wheel :-),
it uses boost::locale::utf... (which I BTW the author of it)


> Best,
>
> Matus
>
 
Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Matus Chochlik
>>
>> 1) Let's use std::string as a encoding-agnostic string... [snip]
>>
>> 2) Let's implement a text storage class (and let's call it) text;
>> This class would store text ... [snip]
>>
>> [snip]
>> The encoding tags would specify both concrete encodings
>> like UTF-16 or ISO-8859-2, etc. and symbolic encodings
>> [snip]
>
> I want to stop this direction and discussion before it begins.

strange, but OK :)

>
> This library is not generic library to handle text in all encodings
> and handle all possible 3rd part libraries and convert between
> them, and this library is not intended to be so.

I know it is not. What I'm saying is that it could be. There are
lots of libraries that pick some subset of text handling, implement
some useful things and then stop. Which is a shame because
text handling su*ks in C++ and Boost is one of the platforms that
have the influence to finally improve things.

>
> The potential user of this library do not want to handle 101 encodings
> one wants to use ONE and SINGLE encoding all over its application
> and convert the strings to Wide encoding on Windows libraries boundaries and
> pass the UTF-8 string as is on Unix programs.

See above.

>
> Note: The developer that uses this library considers ANSI
>       API as broken and only Wide API is a valid API on
>       Windows.

You will get no arguments from me, I agree with you on this point.

>
> So no this library is not Boost.Text it is:
>
>  "I want to use UTF-8 in may application... and I want to use
>   only Wide API on Windows as the only correct API to use"
>

Which limits the usability of the library, because most (Windows
and Linux) applications that I worked on also used third party libraries
which sometimes have their own issues with encodings (similar to the
Windows API)

>>
>> If I'm not terribly mistaken all the code for conversions between
>> encodings already is part of Boost.Locale.
>>
>
> Yes and Boost.Nowide uses UTF-to-UTF conversion part (that is header only one
> in Boost.Locale)
>
>>
>> Then all the useful things like the nowide::args class and
>> the wrappers around iostreams, etc. could be implemented
>> on top of that.
>>
>
> The library does not reinvent the wheel :-),
> it uses boost::locale::utf... (which I BTW the author of it)
>
I never said that it reinvents the wheel (and I know that you are the
author of Boost.Locale, I wrote one of the reviews)

I certainly don't want to push you into something that you don't
want to do. You asked for opinions I just gave you mine.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis


----- Original Message -----

> From: Matus Chochlik <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Wednesday, May 30, 2012 2:25 PM
> Subject: Re: [boost] [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review
>
>>>
>>>  1) Let's use std::string as a encoding-agnostic string... [snip]
>>>
>>>  2) Let's implement a text storage class (and let's call it)
> text;
>>>  This class would store text ... [snip]
>>>
>>>  [snip]
>>>  The encoding tags would specify both concrete encodings
>>>  like UTF-16 or ISO-8859-2, etc. and symbolic encodings
>>>  [snip]
>>
>>  I want to stop this direction and discussion before it begins.
>
> strange, but OK :)
>

The problem is that there were enough discussions on this topic...
and they had not brought a solution.


>>
>>  This library is not generic library to handle text in all encodings
>>  and handle all possible 3rd part libraries and convert between
>>  them, and this library is not intended to be so.
>
> I know it is not. What I'm saying is that it could be. There are
> lots of libraries that pick some subset of text handling, implement
> some useful things and then stop.
> Which is a shame because
> text handling su*ks in C++ and Boost is one of the platforms that
> have the influence to finally improve things.
>

The problem is that Unicode handling is so wide that it is almost
impossible to cover everything, and generally you need to cut at
some point.

In any case I'd prefer to have a narrow range and useful library
that does what it should to do.

I used this nowide approach in CppCMS framework and it saved
me huge amount of problems, so I'm sharing it.

Also note boost::nowide::c(out|in|err) is really something
interesting as it makes things finally work

>
> I certainly don't want to push you into something that you don't
> want to do. You asked for opinions I just gave you mine.
>

I see :-)


 
Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/


>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Beman Dawes
In reply to this post by Artyom Beilis
On Mon, May 28, 2012 at 8:33 AM, Artyom Beilis <[hidden email]> wrote:
> Hello all Booster,
>
> I comments on a library that I want to submit for a formal review.
>
> The library provides an implementation of standard C and C++ library
> functions such that their inputs are UTF-8 aware on Windows without
> requiring using Wide API to make program work on Windows.

Both the above and the docs seem to focus on the problems of UTF-8
awareness on Windows. That's a problem well worth solving, but...

Am I correct in assuming that the library allows writing portable
programs that handle UTF-8 strings correctly on other operating
systems, too, regardless of whether the native narrow string encoding
is UTF-8 or something different? For example, a POSIX-like operating
system set up to use some legacy Asian character set encoding?

--Beman

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [nowide] Easy Unicode For Windows: Request For Comments/Preliminary Review

Artyom Beilis
----- Original Message -----

> From: Beman Dawes <[hidden email]>
> On Mon, May 28, 2012 at 8:33 AM, Artyom Beilis <[hidden email]>
> wrote:
>>  Hello all Booster,
>>
>>  I comments on a library that I want to submit for a formal review.
>>
>>  The library provides an implementation of standard C and C++ library
>>  functions such that their inputs are UTF-8 aware on Windows without
>>  requiring using Wide API to make program work on Windows.
>
> Both the above and the docs seem to focus on the problems of UTF-8
> awareness on Windows. That's a problem well worth solving, but...
>
> Am I correct in assuming that the library allows writing portable
> programs that handle UTF-8 strings correctly on other operating
> systems, too, regardless of whether the native narrow string encoding
> is UTF-8 or something different? For example, a POSIX-like operating
> system set up to use some legacy Asian character set encoding?
>
> --Beman
>

Great Question.

No, on POSIX platforms it is actually inherently incorrect
to convert strings to/from locale encodings.

You can create, remove a file, pass it as a parameter to program
like "\xFF\xFF.txt" (invalid UTF-8) and it would work if the current
locale is UTF-8 locale. Also if you change the locale from let's say
en_US.UTF-8 to en_US.ISO-8859-1 it would not magically change all
files in OS or the strings a user may pass to the program.
(This would work on all POSIX OSs and even under Mac OS X)

POSIX OSes treat strings as NUL terminated cookies.

So altering their content according to the locale would
actually lead to incorrect behavior.
 
for example of I create a program "rm"


   #include <cstdio.h>

   int main(int argc,char **argv)
   {
      for(int i=1;i<argc;i++)
        std::remove(argv[i]);
      return 0;
   }

It would work on with ANY locale and changing the strings would
lead to incorrect behavior.

The meaning of locale under POSIX platform does not have
the same effect in comparison to the locale means under
Windows platform.

Also few additional points:

- Under POSIX platform locale sometimes does not have encoding:
  frequently used C locale does not actually define encoding!
- Non UTF-8 locales considered today deprecated, and it is common
  practice to require that the program would run under UTF-8 locale
  especially when it can be trivially changed by setting one
  environment variable.


Bottom line:

1. The situation is not symmetric under POSIX platforms strings
   are cookies unlike under Windows platform.
2. There are good reasons not to alter the encoding.



Artyom Beilis

P.S.: I had already send this message to the list
      but it seems to be lost.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Loading...