status of Boost Unicode library/enhancements ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

status of Boost Unicode library/enhancements ?

Chris Pirazzi
Hello,

I just scanned about 300 boost-devel messages with the word "Unicode"
and am very excited about the occasional mentions I see of a Boost
Unicode library.

Is that project still alive?  Is there a prototype or beta of any
sort, or even a simple statement of goals I can look at for the
proposed boost project?

I am about to embark on a large text processing (but _not_ display)
project and could make use of such a library.  (digression: part of it
will even involve the processing of Thai text, which seems to be the
#1 cited example of a weird language as far as i18n is concerned.
Having myself typeset a 283-page bilingual Thai-English book, I have
to agree :)

The last mentions I found were from late 2005, where Graham Barnett
mentioned a Unicode library was under development:

  http://thread.gmane.org/gmane.comp.lib.boost.devel/128403
  http://thread.gmane.org/gmane.comp.lib.boost.devel/129807

I tried searching the vault for 'unicode' but no dice.

I have examined (and would use by default) ICU from IBM:

  http://icu.sourceforge.net/userguide/intro.html

I would use its C++ UnicodeString, CharacterIterator, Locale-based
codepage converters, Normalization support, Collation support, and
regex matching (in particular with regex's that match character
classes like "nonspacing mark").

How do the proposed Boost library's capabilities differ from those
offered by ICU?

I've seen that there is ICU integration in Boost.Regex

  http://www.boost.org/libs/regex/doc/unicode.html

And of course it is possible today to store UTF-16 data in a
std::wstring and convert between UTF-8, UTF-16, and UTF-32 using
various easily available routines.  But as you can see above
I need more capability than just that.

ICU is probably sufficient, but I thought it might be nice to use
something that fits in with the rest of boost and STL more nicely.
Something that used/extended existing string mechanisms, iteration
mechanisms, and conversion mechanisms (e.g. those "code conversion
facets" which I do not yet understand :).  Consistent naming, error
reporting, and coding conventions would be a superficial but nice
added bonus.

I would hope that any such library would make some stabs at
performance enhancements such as ICU's UnicodeString's ability to
alias other strings to avoid copies, or store very small strings
inline.  Since ICU has since disabled some of those enhancements:

  http://icu.sourceforge.net/userguide/strings.html#unistr_performance

perhaps that would provide the Boost library an opportunity
to beat ICU's performance!

Thanks for all updates,

     - Chris Pirazzi
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: status of Boost Unicode library/enhancements ?

Jeff Garland
On Wed, 29 Mar 2006 02:08:24 +0700, Chris Pirazzi wrote
> Hello,
>
> I just scanned about 300 boost-devel messages with the word "Unicode"
> and am very excited about the occasional mentions I see of a Boost
> Unicode library.
>
> Is that project still alive?  Is there a prototype or beta of any
> sort, or even a simple statement of goals I can look at for the
> proposed boost project?

I believe all of these projects are dead and I don't recall seeing code
posted.  So unless someone is out there toiling quietly I'm afraid we are
still looking to recruit someone to take this area on.

Jeff
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: status of Boost Unicode library/enhancements ?

Keith MacDonald-2
I'd be grateful for an elegant C++ wrapper to the ICU library
(http://www-306.ibm.com/software/globalization/icu/index.jsp).  A lot of
resources go into developing and maintaining ICU, and it has an unrestricted
license, so why try to compete with it?

And to preempt the question, no, I don't do elegant, but I know it when I
see it - which is why I use Boost.

"Jeff Garland" <[hidden email]> wrote in message
news:[hidden email]...

>
> I believe all of these projects are dead and I don't recall seeing code
> posted.  So unless someone is out there toiling quietly I'm afraid we are
> still looking to recruit someone to take this area on.
>
> Jeff
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: status of Boost Unicode library/enhancements ?

Eric Niebler
Keith MacDonald wrote:
> I'd be grateful for an elegant C++ wrapper to the ICU library
> (http://www-306.ibm.com/software/globalization/icu/index.jsp).  A lot of
> resources go into developing and maintaining ICU, and it has an unrestricted
> license, so why try to compete with it?

http://www.firebirdnews.org/?p=243

A code analysis tool recently run on the Firebird code base turned up
lots of bugs -- in ICU. Doesn't mean a wrapper wouldn't have value, but
it also might not be practical. I don't know.

FWIW, I have the interest and the ability to write Boost.Unicode. What I
lack is time. Anybody with a vested interest in C++ and Unicode should
consider hiring Boost Consulting. *nudge, nudge* :-)

--
Eric Niebler
Boost Consulting
www.boost-consulting.com
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: status of Boost Unicode library/enhancements ?

Rogier van Dalen
In reply to this post by Chris Pirazzi
On 3/28/06, Chris Pirazzi <[hidden email]> wrote:
>
> Hello,
>
> I just scanned about 300 boost-devel messages with the word "Unicode"
> and am very excited about the occasional mentions I see of a Boost
> Unicode library.
>
> ...



> The last mentions I found were from late 2005, where Graham Barnett
> mentioned a Unicode library was under development:
>
>   http://thread.gmane.org/gmane.comp.lib.boost.devel/128403
>   http://thread.gmane.org/gmane.comp.lib.boost.devel/129807
>

Graham and I started on it, but I'm afraid the project stranded due to lack
of time (as always). I'm sorry. If memory serves correctly, all we had
that's reasonably finished is the codecvt facets. Still - some day I'd like
to have a good Boost.Unicode library.

Regards,
Rogier
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: status of Boost Unicode library/enhancements ?

Mathias Gaunard
In reply to this post by Chris Pirazzi
Chris Pirazzi wrote :

> And of course it is possible today to store UTF-16 data in a
> std::wstring

Not really.
std::wstring can only be used for UCS-2 or UCS-4/UTF-32.

(UCS-2 is UTF-16 without surrogate pairs, limiting the range of
representable Unicode characters to 0-65535)



> ICU is probably sufficient, but I thought it might be nice to use
> something that fits in with the rest of boost and STL more nicely.

Have you tried Glib::ustring from glibmm ?
It is an utf-8 implementation with the same interface as std::string.
It should work with STL algorithms and the like.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

[boost.money] was: status of Boost Unicode library/enhancements ?

thorsten.ottosen
In reply to this post by Eric Niebler
Eric Niebler wrote:

> FWIW, I have the interest and the ability to write Boost.Unicode. What I
> lack is time. Anybody with a vested interest in C++ and Unicode should
> consider hiring Boost Consulting. *nudge, nudge* :-)

This a big problem we have to do something about somehow. There are a
lot of rather big libraries that takes so much time to develop, that
it is unrealistic that people can do them in their spare-time.
(unicode, xml, database seems to be the most needed right now)

OTOH, we have lot's of gifted people that could take on development
if given money.

For the benefit of the whole C++ community, we should try to organize
some kind of public money-gathering where companies can sign up
to support the development of these very important libraries.

I imagine that many companies would be willing to pay, say 100 USD,
to support eg. a unicode library. That is sufficiently low for me
to be able to persuade my boos, for example.

If we have some kind of estimate
of how expensive it would be to develop the library, it might turn out
that 100-200 willing companies would be enough fully fund the initial
development.

The website could show then show a bar indicating how close to
funding we where.

Any thoughts?

-Thorsten
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [boost.money] was: status of Boost Unicode library/enhancements ?

David Abrahams
Thorsten Ottosen <[hidden email]> writes:

> If we have some kind of estimate
> of how expensive it would be to develop the library, it might turn out
> that 100-200 willing companies would be enough fully fund the initial
> development.
>
> The website could show then show a bar indicating how close to
> funding we where.
>
> Any thoughts?

Boost.org is not going to get into this area, at least not without
undergoing a total transformation of the way we operate.  There are
just too many problems here, such as how to manage the funds and how
to choose who they're given to, not to mention the fact that Boost
then would have to become an organization with some legal standing.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: status of Boost Unicode library/enhancements ?

Anthony Williams-3
In reply to this post by Eric Niebler
"Eric Niebler" <[hidden email]> writes:

> FWIW, I have the interest and the ability to write Boost.Unicode. What I
> lack is time.

I expect there's quite a few of us in that boat.

> Anybody with a vested interest in C++ and Unicode should
> consider hiring <snip>

.... someone to develop the library.

Anthony
--
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [boost.money] was: status of Boost Unicode library/enhancements ?

Anthony Williams-3
In reply to this post by David Abrahams
David Abrahams <[hidden email]> writes:

> Thorsten Ottosen <[hidden email]> writes:
>
>> If we have some kind of estimate
>> of how expensive it would be to develop the library, it might turn out
>> that 100-200 willing companies would be enough fully fund the initial
>> development.
>>
>> The website could show then show a bar indicating how close to
>> funding we where.
>>
>> Any thoughts?
>
> Boost.org is not going to get into this area, at least not without
> undergoing a total transformation of the way we operate.  There are
> just too many problems here, such as how to manage the funds and how
> to choose who they're given to, not to mention the fact that Boost
> then would have to become an organization with some legal standing.

At first look, I like Thorsten's idea. If we could find some way to allow
companies to spend just a little amount, in support of a specific library, and
we could find enough companies willing to make such a contribution, then we
could make it work.

As you say, the problem is deciding who does the work, and how much they get
for it. Your rate might be double mine, but your work might be ten times the
quality, or you might be done in a quarter of the time (or both!).

Once Boost.org starts accepting payment, and paying people to do work, then it
has to become a proper legal entity, with stricter guidelines on which of us
are members, rather than just the random assortment of developers we are at
the moment.

Particular individuals from the Boost community could run such a scheme on
their own, or a group could form a partnership to do so, but it couldn't be an
"official" Boost thing.

That said, if anyone wants to pay me to develop a library for Boost, or to
discuss setting up such a partnership, I'm listening ;-)

Anthony
--
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [boost.money] was: status of Boost Unicode library/enhancements ?

John Maddock
In reply to this post by thorsten.ottosen
> This a big problem we have to do something about somehow. There are a
> lot of rather big libraries that takes so much time to develop, that
> it is unrealistic that people can do them in their spare-time.
> (unicode, xml, database seems to be the most needed right now)

Right, and some of those: certainly Unicode is going to be very time
intensive, and require ongoing support as new Unicode versions are produced
etc.

> I imagine that many companies would be willing to pay, say 100 USD,
> to support eg. a unicode library. That is sufficiently low for me
> to be able to persuade my boos, for example.
>
> If we have some kind of estimate
> of how expensive it would be to develop the library, it might turn out
> that 100-200 willing companies would be enough fully fund the initial
> development.

As Dave A. says, it creates problems if Boost.org becomes a legal entity
accepting money etc.

However, I note that OSDL have just started a fellowship fund for FOSS
projects, although they're very tied to Linux-related projects.  See
http://www.osdl.org/lab_activities/fellowship_fund/

I also note that Sourceforge has a project-donation facility that we've
never turned on.  I guess one solution would be for individual users to
start their own SF project, turn on the donation option and then request
funds.... but it requires a fair amount of trust on all sides.

John.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [boost.money] was: status of Boost Unicode library/enhancements ?

thorsten.ottosen
In reply to this post by Anthony Williams-3
Anthony Williams wrote:

> David Abrahams <[hidden email]> writes:
>
>
>>Thorsten Ottosen <[hidden email]> writes:
>>
>>
>>>If we have some kind of estimate
>>>of how expensive it would be to develop the library, it might turn out
>>>that 100-200 willing companies would be enough fully fund the initial
>>>development.
>>>
>>>The website could show then show a bar indicating how close to
>>>funding we where.
>>>
>>>Any thoughts?
>>
>>Boost.org is not going to get into this area, at least not without
>>undergoing a total transformation of the way we operate.  There are
>>just too many problems here, such as how to manage the funds and how
>>to choose who they're given to, not to mention the fact that Boost
>>then would have to become an organization with some legal standing.
>
>
> At first look, I like Thorsten's idea. If we could find some way to allow
> companies to spend just a little amount, in support of a specific library, and
> we could find enough companies willing to make such a contribution, then we
> could make it work.

I vividly remember many Amiga games where developed after a similar
model. After presenting some demo and/or screenshot of the game in
progress, the team would wait until they had confirmation that, say 500
people would buy the game.

I personally think, however, that that model was to insucure for the
developers.

> As you say, the problem is deciding who does the work, and how much they get
> for it. Your rate might be double mine, but your work might be ten times the
> quality, or you might be done in a quarter of the time (or both!).

The work should be done by whoever is willing to write a contract for
the work. Boost would be a mediator giving trust to those paying and
support to those developing.

Those developing should be willing to spend some extra time on the
effort, some of their spare-time, just like anhybody else not getting
paid should.

> Once Boost.org starts accepting payment, and paying people to do work, then it
> has to become a proper legal entity, with stricter guidelines on which of us
> are members, rather than just the random assortment of developers we are at
> the moment.

Right. I kinda imagined that Boost would be a mediator, ensuring
quality, support and trust into the process.

> That said, if anyone wants to pay me to develop a library for Boost, or to
> discuss setting up such a partnership, I'm listening ;-)

That's the thing: hardly no normal company would sponsor free software
for other companies, we would need to keep the donation small.

-Thorsten
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [boost.money] was: status of Boost Unicode library/enhancements ?

thorsten.ottosen
In reply to this post by John Maddock
John Maddock wrote:
>>This a big problem we have to do something about somehow. There are a
>>lot of rather big libraries that takes so much time to develop, that
>>it is unrealistic that people can do them in their spare-time.
>>(unicode, xml, database seems to be the most needed right now)
>
>
> Right, and some of those: certainly Unicode is going to be very time
> intensive, and require ongoing support as new Unicode versions are produced
> etc.

New version can be separate projects or they may be easy enough
to handle as normal maintenance.

>>I imagine that many companies would be willing to pay, say 100 USD,
>>to support eg. a unicode library. That is sufficiently low for me
>>to be able to persuade my boos, for example.
>>
>>If we have some kind of estimate
>>of how expensive it would be to develop the library, it might turn out
>>that 100-200 willing companies would be enough fully fund the initial
>>development.
>
>
> As Dave A. says, it creates problems if Boost.org becomes a legal entity
> accepting money etc.

Ok.

So the money don't go to Boost, but are kept by the one doing the work,
or paid back if there could not be raised enough funds.

I would mind that Boost Consulting handled the money issues as
a community service (and perhaps as a principal developer).

> However, I note that OSDL have just started a fellowship fund for FOSS
> projects, although they're very tied to Linux-related projects.  See
> http://www.osdl.org/lab_activities/fellowship_fund/
>
> I also note that Sourceforge has a project-donation facility that we've
> never turned on.  I guess one solution would be for individual users to
> start their own SF project, turn on the donation option and then request
> funds.... but it requires a fair amount of trust on all sides.

Right.

For those paying, the money should be repaid if the library is not
accepted into boost.

For those developing, continuous discussions on the dev list
should insure a high quality and thus great chances of acceptance.

-Thorsten


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [boost.money] was: status of Boost Unicode library/enhancements ?

Eugene Talagrand
In reply to this post by thorsten.ottosen

>> At first look, I like Thorsten's idea. If we could find some way to allow
>> companies to spend just a little amount, in support of a specific library, and
>> we could find enough companies willing to make such a contribution, then we
>> could make it work.
>
> I vividly remember many Amiga games where developed after a similar
> model. After presenting some demo and/or screenshot of the game in
> progress, the team would wait until they had confirmation that, say 500
> people would buy the game.
>
> I personally think, however, that that model was to insucure for the
> developers.
>

I remember seeing some online donation systems, where if the donation
target was not met everyone got their money back. So there'd be no risk
on either part. I can't seem to find the reference now though.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost