[nowide] Request for interest (nowide unicode support for windows)

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
Hello All,

I've recently written a small library that allows writing platform
independent Unicode aware applications transparently.

Problem, basic stuff like opening a file, deleting it, is generally
accepted for granted and indeed, STL provides std::fstream and C library
provides FILE* API like std::fopen, std::remove, std::rename.

However, this API is broken under Windows any time we talk about some
basic localization like using Unicode file names.

Unlike all other Windows development moved to redesigning API to wide
characters instead of adapting backward compatible UTF-8 locales into
the core system.

Result: it is total nightmare to write any kind of Unicode aware cross
platform programs.

Even when trying to use wide strings and calling _wfopen or _wremove
where needed is not enough, as there is no simple replacement for
std::fstream. MSVC provided non-standard extension where
std::fstream::open() receives wide string, but this is not accepted by
many other compilers including MinGW gcc that has shared libstdc++
over multiple platforms. Not talking about that standard does not
define std::fstream::open(wchar_t const *,...) (and not in Tr1 as well).

So... proposed solution (in short):

namespace boost {
  namespace nowide {
     #if !defined(BOOST_WIN32)

     using namespace std;

     #else // Windows Wide API

     std::wstring convert(std::string const &);
     std::string convert(std::wstring const &);

     FILE *fopen(char const *,char const *);
     FILE *freopen(char const *,char const *,FILE *);
     int remove(char const *);
     int rename(char const *,char const *);

     template<typename Char,typename Traits ...>
     basic_filebuf {
        ...
        open(char const *,...);
        ...
     };
   
     template<typename Char...>
     basic_istream {...};
     template<typename Char...>
     basic_ostream {...};
     template<typename Char...>
     basic_fstream {...};
  }
}


When working on non-win32 platform it would use native
API (and most POSIX OSes use UTF-8 nativly)

On whindows each of these classes and functions assumes UTF-8 strings as
input and map underlying functions to _w* alternatives or
in case of basic_filebuf implements basic_filebuf over FILE *api and
_wfopen.

This would allow much easier writing cross platform application
using unified and standard API instead of non-standard wide API.

Note, functions boost::nowide::convert would allow adapt any library
transparently use of widely used UTF-8 API instead of WIN32 one.

I had implemented this for my own projects, I'm asking if boost is
interested in something like that at all.

Artyom




     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Steven Watanabe-4
AMDG

Artyom wrote:

> I've recently written a small library that allows writing platform
> independent Unicode aware applications transparently.
>
> Problem, basic stuff like opening a file, deleting it, is generally
> accepted for granted and indeed, STL provides std::fstream and C library
> provides FILE* API like std::fopen, std::remove, std::rename.
>
> However, this API is broken under Windows any time we talk about some
> basic localization like using Unicode file names.
>
> Unlike all other Windows development moved to redesigning API to wide
> characters instead of adapting backward compatible UTF-8 locales into
> the core system.
>
> Result: it is total nightmare to write any kind of Unicode aware cross
> platform programs.
>
> Even when trying to use wide strings and calling _wfopen or _wremove
> where needed is not enough, as there is no simple replacement for
> std::fstream. MSVC provided non-standard extension where
> std::fstream::open() receives wide string, but this is not accepted by
> many other compilers including MinGW gcc that has shared libstdc++
> over multiple platforms. Not talking about that standard does not
> define std::fstream::open(wchar_t const *,...) (and not in Tr1 as well).
>  

What about boost::filesystem::path?

In Christ,
Steven Watanabe

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
> >   
>
> What about boost::filesystem::path?
>
> In Christ,
> Steven Watanabe
>

This has nothing to do with filesystem::path, it is about fixing issues
of standard library under Windows where fopen of std::fstream
is not capable of opening ordinary files.

And BTW simple boost::filesystem::path has exactly the same issue
when it is not "wide path" under Microsoft Windows.

Artyom



     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Beman Dawes
On Sun, Jun 13, 2010 at 12:48 PM, Artyom <[hidden email]> wrote:

>> >
>>
>> What about boost::filesystem::path?
>>
>> In Christ,
>> Steven Watanabe
>>
>
> This has nothing to do with filesystem::path, it is about fixing issues
> of standard library under Windows where fopen of std::fstream
> is not capable of opening ordinary files.
>
> And BTW simple boost::filesystem::path has exactly the same issue
> when it is not "wide path" under Microsoft Windows.

Version 3, now in trunk, is totally "wide path" under Windows, at
least with the Microsoft supplied standard library. And even with
Cygwin, everything is totally wide path except that wide paths in file
opens are converted to narrow paths for the actual i/o stream call.

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
> Version 3, now in trunk, is totally "wide path" under
> Windows, at
> least with the Microsoft supplied standard library. And
> even with
> Cygwin, everything is totally wide path except that wide
> paths in file
> opens are converted to narrow paths for the actual i/o
> stream call.
>

Question:

Can I write:

  boost::filesystem::fstream f("שלום.txt",std::ios_base::out);

When "שלום.txt" is UTF-8 string and Unicode file name will be created?
If so, way to go.

If you suggesting:

  boost::filesystem::fstream f(L"שלום.txt",std::ios_base::out);

Then this is not what I'm talking about.

I'm not talking about "Wide" path -- this is exectly what I was writing
"nowide" make a library compatible with C/C++ **standard** functions
like std::fstream::open(char const *,...) or std::fopen(char const *,...)
but be fully Unicode enabled (utf-8) as they are on all-other operating
systems without all "wide" api.

Is somebody interested?

Artyom




     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
In reply to this post by Beman Dawes
>
> Version 3, now in trunk, is totally "wide path" under
> Windows, at
> least with the Microsoft supplied standard library. And
> even with
> Cygwin, everything is totally wide path except that wide
> paths in file
> opens are converted to narrow paths for the actual i/o
> stream call.
>
> --Beman

Quick glance on the boost::filesystem::v3::basic_filebuf

There is a problem with MinGW implementation. libstdc++ (as according
to standard does not support opening files with wide characters
so as I can see you will not be able to open even wide path
with boost::filesystem::ifstream on MinGW platform.

Correct me if I wrong or missed something.

(Not talking about Cygwin that has native UTF-8 support)




     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Alexander Lamaison
In reply to this post by Artyom Beilis
> On Sun, 13 Jun 2010 10:24:15 -0700 (PDT), Artyom wrote:
>
> I'm not talking about "Wide" path -- this is exectly what I was writing
> "nowide" make a library compatible with C/C++ **standard** functions
> like std::fstream::open(char const *,...) or std::fopen(char const *,...)
> but be fully Unicode enabled (utf-8) as they are on all-other operating
> systems without all "wide" api.
>
> Is somebody interested?

I am.  I don't know if this is the right solution but it's definitely worth
some thought.

ATM I'm writing my library interfaces to take basic_path<T> paramters so
that Windows developers can pass a fs::wpath and others can pass fs::path.
It would be nice if everyone could pass the same thing.

Alex

--
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Steven Watanabe-4
In reply to this post by Artyom Beilis
AMDG

Artyom wrote:
> Question:
>
> Can I write:
>
>   boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
>
> When "שלום.txt" is UTF-8 string and Unicode file name will be created?
> If so, way to go.
>  

In v3, yes.

In Christ,
Steven Watanabe

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Beman Dawes
On Tue, Jun 15, 2010 at 10:08 AM, Steven Watanabe <[hidden email]> wrote:

> AMDG
>
> Artyom wrote:
>>
>> Question:
>>
>> Can I write:
>>
>>  boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
>>
>> When "שלום.txt" is UTF-8 string and Unicode file name will be created?
>> If so, way to go.
>>
>
> In v3, yes.

There are some caveats, but it should work, and there are some fairly
similar test cases passing all compilers. To actually write the string
literal like that, the compiler must accept UTF-8 in string literals,
for example. On windows, the codepage has to be set to UTF-8. Those
are issues that affect any solution, not just filesystem v3.

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
In reply to this post by Steven Watanabe-4

> >
> > Can I write:
> >
> >   boost::filesystem::fstream
> f("שלום.txt",std::ios_base::out);
> >
> > When "שלום.txt" is UTF-8 string and Unicode file
> name will be created?
> > If so, way to go.
> >   
>
> In v3, yes.
>

Are you sure about this? How the file will be open?

Can you explain what is the path the UTF-8 string passes till
the Win32API system call or standard library call?

C++ standard defines open only with "char const *"

Quoting latest C++ standard draft (section 27.7 std::basic_streambuf)

    // 27.9.1.4 Members:
    bool is_open() const;
    basic_filebuf<charT,traits>* open(const char* s,
        ios_base::openmode mode);
    basic_filebuf<charT,traits>* open(const string& s,
        ios_base::openmode mode);
    basic_filebuf<charT,traits>* close();

And this it is defined on GCC's libstdc++.

As I can see you use std::basic_filebuf for implementing
boost::filesystem::basic_fstream.

So how do you open "Wide" path or "UTF-8" path using these
functions?

- Standard library does not accept "wchar_t const *" as parameter to
  open (with exception of MSVC specific extension)
- Windows API does not support UTF-8 codepage.

So how do you deal with it?

---------------

In the small library I had written I actually implement the
basic_filebuf over stdio, and use CRTL Win32 API _wfopen function
to open files with Unicode filenames.

I hadn't seen anything like that in boost::filesystem::v3
So do I miss something?

Artyom


     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Alexander Lamaison
In reply to this post by Beman Dawes
> On Tue, 15 Jun 2010 11:59:21 -0400, Beman Dawes wrote:
>
> >>
> >> Can I write:
> >>
> >>  boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
> >>
> >> When "שלום.txt" is UTF-8 string and Unicode file name will be created?
> >> If so, way to go.
> >>
> >
> > In v3, yes.
>
> There are some caveats, but it should work, and there are some fairly
> similar test cases passing all compilers.

This is not what I'd understood from our previous discussion.  I was under
the impression filesystem v3 running on Windows would take this narrow path
string and convert it to UTF-16 using the *local code page*.  This means
the example above would only work if the computer in question were set to
Hebrew.  Even that might not work - I'm not sure if the Hebrew code page
contains the necessary characters to represent 'txt'.

Did I misunderstand?

Alex

--
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
> >
> > >>
> > >> Can I write:
> > >>
> > >>  boost::filesystem::fstream
> f("שלום.txt",std::ios_base::out);
> > >>
> > >> When "שלום.txt" is UTF-8 string and
> Unicode file name will be created?
> > >> If so, way to go.
> > >>
> > >
> > > In v3, yes.
> >
> > There are some caveats, but it should work, and there
> are some fairly
> > similar test cases passing all compilers.
>
> This is not what I'd understood from our previous
> discussion.  I was under
> the impression filesystem v3 running on Windows would take
> this narrow path
> string and convert it to UTF-16 using the *local code
> page*.  This means
> the example above would only work if the computer in
> question were set to
> Hebrew. 
>
> [snip]
>
> Did I misunderstand?

Yes, you did.

When I was talking about UTF-8 I mean Unicode and not subset.

For example, in my case I want to open a file

   std::ofstream f("سلام-שלום-Peace-Мир.txt")

I can't do this on Windows (only)

So I open it with

   nowide::ofstream f("سلام-שלום-Peace-Мир.txt")

And it works on Windows as well.

The only operating system that does not allow **any** file being
opened with std::fstream::open or std::fopen is Windows
and this is what the library wants to fix.

You can download my code there:

   http://art-blog.no-ip.info/files/nowide.zip

It gives you:

   STL's

   nowide::ifstream
   nowide::ofstream
   nowide::fstream
   nowide::filebuf

   STDlib's
   
   nowide::fopen
   nowide::freopen
   nowide::remove
   nowide::rename

All using UTF-8 strings (as it usually work on all modern operating
systems)

Artyom




     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Alexander Lamaison
> On Tue, 15 Jun 2010 12:38:06 -0700 (PDT), Artyom wrote:
>
> > > > Can I write:
> > > >
> > > >  boost::filesystem::fstream
> > > >  f("שלום.txt",std::ios_base::out);
> > > >
> > > > When "שלום.txt" is UTF-8 string and Unicode file name will
> > > > be created?  If so, way to go.
> > >
> > > In v3, yes.
> > >
> > > There are some caveats, but it should work, and there
> > > are some fairly
> > > similar test cases passing all compilers.
> >
> > This is not what I'd understood from our previous
> > discussion.  I was under
> > the impression filesystem v3 running on Windows would take
> > this narrow path
> > string and convert it to UTF-16 using the *local code
> > page*.  This means
> > the example above would only work if the computer in
> > question were set to
> > Hebrew. 
> >
> > [snip]
> >
> > Did I misunderstand?
>
> Yes, you did.
>
> When I was talking about UTF-8 I mean Unicode and not subset.

Me too.

I'm saying that Filesystem v3 on Windows doesn't interpret narrow strings
as UTF-8 by default.  Berman said that it did but I beg to differ.  Here's
what the comments say:

//  For Windows, wchar_t strings do not undergo conversion. char strings
//  are converted using the "ANSI" or "OEM" code pages, as determined by
//  the AreFileApisANSI() function, or, if a conversion argument is given,
//  using a conversion object modeled on std::wstring_convert.

In other words "שלום.txt" would be interpreted as being in whatever
encoding the local code page is set to and would, therefore, produce a path
containing gibberish for most people.  This is standard Windows behaviour
:P

Your problem is yet another step further than this.  Assuming fs3 correctly
converted "שלום.txt" to the UTF-16 equivalent, how do you then open a file
using this wide-char name?  Well, MSVC has wchar_t overloads so this works
fine.  You're right about glibc++/MinGW though.  fs::fstream will fail
there.  Rather than introducing a nowide library, why don't we just try to
fix this in Boost.Filesystem?

Alex

--
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
>
> Me too.
>
> I'm saying that Filesystem v3 on Windows doesn't interpret
> narrow strings
> as UTF-8 by default.  Berman said that it did but I
> beg to differ.  Here's
> what the comments say:
>
> //  For Windows, wchar_t strings do not undergo
> conversion. char strings
> //  are converted using the "ANSI" or "OEM" code
> pages, as determined by
> //  the AreFileApisANSI() function, or, if a
> conversion argument is given,
> //  using a conversion object modeled on
> std::wstring_convert.
>
> In other words "שלום.txt" would be interpreted as being
> in whatever
> encoding the local code page is set to and would,
> therefore, produce a path
> containing gibberish for most people.  This is
> standard Windows behaviour
> :P


This standard Windows behavior is exactly **the** problem.

To be honest, have you seen anybody using "wide-path" outside
of Windows scope? Do you actually need such "wide-path" for POSIX
platforms?

The answer is not.

Actually, POSIX OS does not care about filename charset, as I can create
a file

   std::ofstream f("\xf9\xec\xe5\xed.txt");

Which is valid file (שלום in ISO-8859-8) but invalid UTF-8. But
it is valid file-name (and the locale is UTF-8 locale).

>
> Your problem is yet another step further than this. 
> Assuming fs3 correctly
> converted "שלום.txt" to the UTF-16 equivalent, how do
> you then open a file
> using this wide-char name?  Well, MSVC has wchar_t
> overloads so this works
> fine.  You're right about glibc++/MinGW though. 
> fs::fstream will fail
> there.  Rather than introducing a nowide library, why
> don't we just try to
> fix this in Boost.Filesystem?
>

I think that this can be fixed (the way I fixed it in nowide
implementing fstreambuf over stdio+_wfopen)

   http://art-blog.no-ip.info/files/nowide.zip

But this is one particular problem.

There are more. What about filesystem::remove and others?
From what I see in the code, it supports only path and not wpath

---------------------

But this is a part of one bigger problem.

When I develop cross platform applications I have following options
for operating of files.

For example when I want to remove, rename, create a file
in a program writing cross platform applications, writing
using standard platform independent C++, Writing for POSIX operating
systems and for MS Windows.


OS \ Str  |  std::string   |   std::wstring   |
-----------------------------------------------
Std C++   |     Ok         |    Not Defined!
POSIX     |     Ok         |    Not Defined!
WinAPI    |   Not UTF-8    |        Ok          

What I can see. I need either use wide strings that works only on Windows
but require me to convert to other encoding for operations on files.

Or I may use normal strings as standard requires and have problems
with Windows as it is not fully supported.

Or I need to write two kinds of code:

- One for Windows using "Wide" strings
- One for anything else using normal strings.

Because windows does not support UTF-8 code-page.

So far? Why? Why do you need all this if you can just
create a tiny layer that makes Window support UTF-8 code page
by converting std::string to std::wstring and calling appropriate
API?


My Opinion:
-----------

- There is Neither use nor Need of "Wide" strings for file system
  operations on all platforms but Windows.

- Introducing boost::filesystem::wpath does not help as
  it meaningless on other OSes.

- Using Wide strings is extremely error prone in cross platform
  applications as on Windows they are UTF-16 and on POSIX they
  are UTF-32 encodings.

Wide Path support just make our applications more complicated
and error prone.

So... Just create an API that is friendly to UTF-8 strings and
forget about this hell.

-------------

But from what I see this will never happen in Boost as it is too
Windows centric, and Windows is too ignorant to basic programmers
needs who want to write a portable programs.

Regards.
  Artyom

P.S.: The title of this mail is request for interest.
      It is ok not to have one.




     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Alexander Lamaison
> On Wed, 16 Jun 2010 12:50:19 -0700 (PDT), Artyom wrote:
>
> I think that this can be fixed (the way I fixed it in nowide
> implementing fstreambuf over stdio+_wfopen)
>
>    http://art-blog.no-ip.info/files/nowide.zip
>
> But this is one particular problem.
>
> There are more. What about filesystem::remove and others?
> From what I see in the code, it supports only path and not wpath

Really?  I doubt that.  In FSv2 it takes a template path:

template <class Path>
bool remove(const Path& p, system::error_code & ec = singular );

This delegates to RemoveFileA if passed a path and RemoveFileW if passed a
wpath.  glibc++/MinGW presumably uses the posix_remove API so this does,
again, suffer from the problem.  We could work around it in boost though I
can't help but feel this is a MinGW problem: if it wants to work the
windows way is should provide wide APIs as well, if it wants to pretend
it's POSIX is should interpret narrow strings as UTF-8.

> When I develop cross platform applications I have following options
> for operating of files.
>
> For example when I want to remove, rename, create a file
> in a program writing cross platform applications, writing
> using standard platform independent C++, Writing for POSIX operating
> systems and for MS Windows.
>
>
> OS \ Str  |  std::string   |   std::wstring   |
> -----------------------------------------------
> Std C++   |     Ok         |    Not Defined!
> POSIX     |     Ok         |    Not Defined!
> WinAPI    |   Not UTF-8    |        Ok          
>
> What I can see. I need either use wide strings that works only on Windows
> but require me to convert to other encoding for operations on files.
>
> Or I may use normal strings as standard requires and have problems
> with Windows as it is not fully supported.

We could potentially fix this in Filesystem v3 if it interpreted incoming
narrow strings as UTF-8.  Then you could create a 'path' using whichever
type of string you like and the boost::filesystem functions would 'just
work' (ok, issues with MinGW but nothing we can't work around by
incorporating your code).

> So far? Why? Why do you need all this if you can just
> create a tiny layer that makes Window support UTF-8 code page
> by converting std::string to std::wstring and calling appropriate
> API?

Yep, that's pretty much what I'm saying.

> - Introducing boost::filesystem::wpath does not help as
>   it meaningless on other OSes.

It's gone in v3.

> So... Just create an API that is friendly to UTF-8 strings and
> forget about this hell.

+1 from me with one modification: don't prevent using wide path on Windows.
Often you will need to pass a wide path that you get from somewhere else
and it would be a pain if we had to convert these to UTF-8 manually.

> But from what I see this will never happen in Boost as it is too
> Windows centric, and Windows is too ignorant to basic programmers
> needs who want to write a portable programs.

Why?  Boost.Filesystem v3 almost does all of this already.  It would need
two changes to make it work exactly as you want:

- Interpret narrow strings as UTF-8 by default on Windows (the user
  could always imbue it with the local code page facet if the really
  wanted to interact with the 'A' versions of Windows APIs).

- Work around the MinGW 'bug' by incorporating some of your code.

> P.S.: The title of this mail is request for interest.
>       It is ok not to have one.

I'm very much interested.

Alex

--
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Beman Dawes
In reply to this post by Artyom Beilis
On Tue, Jun 15, 2010 at 2:36 PM, Artyom <[hidden email]> wrote:

>
>> >
>> > Can I write:
>> >
>> >   boost::filesystem::fstream
>> f("שלום.txt",std::ios_base::out);
>> >
>> > When "שלום.txt" is UTF-8 string and Unicode file
>> name will be created?
>> > If so, way to go.
>> >
>>
>> In v3, yes.
>>
>
> Are you sure about this? How the file will be open?

Depends on the standard library implementation. The Dinkumware
library, used by Microsoft and some others, has an additional
constructor/open that takes a wide character string.  The fallback is
to use the standard narrow character constructor/open.

> Can you explain what is the path the UTF-8 string passes till
> the Win32API system call or standard library call?
>
> C++ standard defines open only with "char const *"

The wide character overloads are Dinkumware / Microsoft extensions.

> Quoting latest C++ standard draft (section 27.7 std::basic_streambuf)
>
>    // 27.9.1.4 Members:
>    bool is_open() const;
>    basic_filebuf<charT,traits>* open(const char* s,
>        ios_base::openmode mode);
>    basic_filebuf<charT,traits>* open(const string& s,
>        ios_base::openmode mode);
>    basic_filebuf<charT,traits>* close();
>
> And this it is defined on GCC's libstdc++.

Yep, os if that library is in use, the fallback is to use the narrow
character open. And of course that also what is used on POSIX-like
systems.

>
> As I can see you use std::basic_filebuf for implementing
> boost::filesystem::basic_fstream.
>
> So how do you open "Wide" path or "UTF-8" path using these
> functions?
>
> - Standard library does not accept "wchar_t const *" as parameter to
>  open (with exception of MSVC specific extension)
> - Windows API does not support UTF-8 codepage.
>
> So how do you deal with it?

Use the Microsoft UTF-codepage, 65001

HTH,

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Beman Dawes
In reply to this post by Alexander Lamaison
2010/6/16 Alexander Lamaison <[hidden email]>:

> I'm saying that Filesystem v3 on Windows doesn't interpret narrow strings
> as UTF-8 by default.  Beman said that it did...

There is a misunderstanding here. V3, like any Windows program, by
default interprets narrow strings according to the File code page. You
have to configure that yourself if you want it to be UTF-8. Since that
is a pain, and you are using Microsoft or one of the other compilers
that support wide opens, it seems easier just to convert from the
narrow string to the wide string yourself. But if you want to fool
around getting the codepage support in place, V3 should handle it
AFAIK.

>but I beg to differ.  Here's
> what the comments say:
>
> //  For Windows, wchar_t strings do not undergo conversion. char strings
> //  are converted using the "ANSI" or "OEM" code pages, as determined by
> //  the AreFileApisANSI() function, or, if a conversion argument is given,
> //  using a conversion object modeled on std::wstring_convert.
>
> In other words "שלום.txt" would be interpreted as being in whatever
> encoding the local code page is set to and would, therefore, produce a path
> containing gibberish for most people.  This is standard Windows behaviour
> :P
>
> Your problem is yet another step further than this.  Assuming fs3 correctly
> converted "שלום.txt" to the UTF-16 equivalent, how do you then open a file
> using this wide-char name?  Well, MSVC has wchar_t overloads so this works
> fine.  You're right about glibc++/MinGW though.  fs::fstream will fail
> there.  Rather than introducing a nowide library, why don't we just try to
> fix this in Boost.Filesystem?

Agreed. If anyone wants to submit a patch for glibc++/MinGW that uses
the wide Windows API, that would be a better solution.

--Beman
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Andrey Semashev-2
In reply to this post by Artyom Beilis
On 06/16/2010 11:50 PM, Artyom wrote:
>
> To be honest, have you seen anybody using "wide-path" outside
> of Windows scope? Do you actually need such "wide-path" for POSIX
> platforms?

Well, we actually use wide paths all around in our code, and
Boost.Filesystem does a great job at providing a portable API for all
platforms, including POSIX.

Personally, I think that wide paths are more convenient than UTF-8 since
it is easier to apply string processing algorithms on them.
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Lars Viklund
In reply to this post by Beman Dawes
On Wed, Jun 16, 2010 at 05:23:19PM -0400, Beman Dawes wrote:
> On Tue, Jun 15, 2010 at 2:36 PM, Artyom <[hidden email]> wrote:
> > - Standard library does not accept "wchar_t const *" as parameter to
> >  open (with exception of MSVC specific extension)
> > - Windows API does not support UTF-8 codepage.
> >
> > So how do you deal with it?
>
> Use the Microsoft UTF-codepage, 65001

Judging by assorted postings by Michael Kaplan (Unicode Grandmaster at
Microsoft), there seems to be much fun to be derived from trying to use
the UTF-8 codepage with narrow APIs.

[1] http://blogs.msdn.com/b/michkap/archive/2006/07/14/665714.aspx
[2] http://blogs.msdn.com/b/michkap/archive/2006/10/11/816996.aspx
[3] http://blogs.msdn.com/b/michkap/archive/2006/03/13/550191.aspx
[4] http://blogs.msdn.com/b/michkap/archive/2007/05/11/2547703.aspx

--
Lars Viklund | [hidden email]
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [nowide] Request for interest (nowide unicode support for windows)

Artyom Beilis
In reply to this post by Alexander Lamaison
> > There are more. What about filesystem::remove and others?
> > From what I see in the code, it supports only path and not wpath
>
> Really?  I doubt that.  In FSv2 it takes a template path:

I was talking about v3

> This delegates to RemoveFileA if passed a path and RemoveFileW if passed a
> wpath.  glibc++/MinGW presumably uses the posix_remove API so this does,
> again, suffer from the problem.  We could work around it in boost though I
> can't help but feel this is a MinGW problem: if it wants to work the
> windows way is should provide wide APIs as well, if it wants to pretend
> it's POSIX is should interpret narrow strings as UTF-8.

It is not about "pretending to work on POSIX"

GCC's stdlibc++ uses CRTL, same as if you call stdlib remove it would
use DeleteFileA and if you use _wremove it would call DeleteFileW.

And you can use _wremove in MinGW as it is CRTL's function.

This has absolutely nothing to do with POSIX

> We could potentially fix this in Filesystem v3 if it interpreted incoming
> narrow strings as UTF-8.  Then you could create a 'path' using whichever
> type of string you like and the boost::filesystem functions would 'just
> work' (ok, issues with MinGW but nothing we can't work around by
> incorporating your code).

This would be very good solution..

>  It's gone in v3.

Very good.

> > So... Just create an API that is friendly to UTF-8 strings and
> > forget about this hell.
>
> +1 from me with one modification: don't prevent using wide path on Windows.
> Often you will need to pass a wide path that you get from somewhere else
> and it would be a pain if we had to convert these to UTF-8 manually.

Agree. if windows users want to use wide path, let them, but this code
would be Windows only.

> Why?  Boost.Filesystem v3 almost does all of this already.  It would need
> two changes to make it work exactly as you want:
>
> - Interpret narrow strings as UTF-8 by default on Windows (the user
>   could always imbue it with the local code page facet if the really
>   wanted to interact with the 'A' versions of Windows APIs).
>

This is not solution:

Windows had never supported, does not support according to Lars Viklund
links it seems like it will never be supported. See this quote:

>
> Judging by assorted postings by Michael Kaplan (Unicode Grandmaster at
> Microsoft), there seems to be much fun to be derived from trying to use
> the UTF-8 codepage with narrow APIs.
>
> [1] http://blogs.msdn.com/b/michkap/archive/2006/07/14/665714.aspx
> [2] http://blogs.msdn.com/b/michkap/archive/2006/10/11/816996.aspx
> [3] http://blogs.msdn.com/b/michkap/archive/2006/03/13/550191.aspx
> [4] http://blogs.msdn.com/b/michkap/archive/2007/05/11/2547703.aspx
>  
> Lars Viklund


So the only way to do the thing right is **always** use
Wide API on windows and convert normal strings to wide one just before
calling apropriate API functions.

> - Work around the MinGW 'bug' by incorporating some of your code.
>

I just want to be clear... This is not a bug (I know you put it in quotes).
This is what C++ says... std::basic_streambuf, **does not** have
open() member function that receives wide strings.


Artyom


     
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12