[review] [text] Text review begins June 11

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[review] [text] Text review begins June 11

Boost - Users mailing list
The Boost formal review of Zach Laine's Unicode library, Text,
commences on June 11th and concludes June 20th. We welcome the
participation of Boost developers and users, and the C++ community.

Quote:

    This library includes three layers:
    - The string layer, a set of types that constitute "a better std::string"
    - The Unicode layer, consisting of the Unicode algorithms and data
    - The text layer, a set of types like the string layer types, but
providing transparent Unicode support

Code: https://github.com/tzlaine/text
Docs: https://tzlaine.github.io/text

Text requires C++14 or higher, and supports GCC 5+, Clang 3.6+, and MSVC 2017.

Zach initially presented the library at C++Now in 2018. He also
intends to standardize a library solution for Unicode based on this
work.

    Talk: https://www.youtube.com/watch?v=944GjKxwMBo
    Slides: https://tinyurl.com/boost-text-slides

Try the library today, and please consider submitting a review, or
even contributing to the discussion next week.

Glen
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [review] [text] Text review begins June 11

Boost - Users mailing list
On Tue, Jun 2, 2020 at 7:44 PM Glen Fernandes via Boost-users <[hidden email]> wrote:
>
> The Boost formal review of Zach Laine's Unicode library, Text,
> commences on June 11th and concludes June 20th. We welcome the
> participation of Boost developers and users, and the C++ community.

Here is my review of Text in the form of a list of notes/questions, in no particular order.

Reading through the documentation, everything makes sense. The motivation for each type and operation is easy to understand.

I welcome the use of signed int as index/size type, but I wonder if this would interact in some negative way when mixed with similar types from the standard library. I don't think so, but who knows.

I don't like the multi-page format of the documentation. I'd prefer a single HTML page for easy searching.


I searched and couldn't find any place in the documentation that explains how ErrorHandler is used. Looking at the source code, it's just a function, but its interface should be documented.

Generally I dislike interfaces with configurable error handling but it appears that in the case of the conversion iterators this is warranted: both replacement character and throwing are useful. I wonder though if it would be better to delete this template parameter and always output 0xfffd, and in addition provide an iterator adaptor type which can be used to wrap e.g. an output iterator, and which throws if it sees 0xfffd being output.

Both .c_str() and .data() should be added to string for compatibility. Also, it is not true that s.begin() is equivalent to s.c_str(), because in debug iterators should not be pointers. Perhaps &*s.begin() is equivalent, but this should not be valid on an empty string (assert), null termination notwithstanding.

The note about passing ropes by value, I think, is redundant. Logically, copying a rope gets you an independent copy, the fact that internally the copies share state is an implementation detail. Therefore, the documentation can just specify that a rope is as thread-safe as an int.

Speaking of thread-safety, where are the thread-safety guarantees of rope documented? Where are the relevant tests?

Speaking of tests, I suggest separating the generated test sources in a subdirectory so things like test/string.cpp are easy to find. Not to mention github.com truncates the test directory to 1000 files.

The grammar seems wrong here; 'I don't know if you've ever written an undo system that can do, undo, or redo any command you can think of, in less than 40 lines of code, but there one is."

"Every text editor you've ever used that handles large files gracefully has done something to break the file into chunks" What about the buffer gap algorithm? I think Borland editors used this way back in the day. :)

In conclusion:

I spent a couple of hours studying the documentation and parts of the source code.

I vote to ACCEPT Text into Boost.

_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [review] [text] Text review begins June 11

Boost - Users mailing list
On 3/06/2020 18:22, Emil Dotchevski wrote:
> Therefore, the documentation can just specify that a rope is as thread-safe as an int.

I strongly dislike the phrase "as thread-safe as an int" because it's
often applied to structures containing more than one pointer-size member
(eg. shared_ptr), and there are architectures (notably x86) where an int
is considerably more thread-safe than this could be, because they have a
(architecture, not language) guarantee of coherence lacking in larger
values (or in multi-member structs regardless of size).  (Granted,
portable code cannot assume pointer-size values are coherent, but often
a given architecture can be assumed.)

I don't recall where intrusive_ptr (the core of rope) falls on this
spectrum; it is using an atomic refcount but there's still the separate
operations of copying the pointer and updating the refcount.  I think
it's safe if done in the correct order, however; but not otherwise.

> The grammar seems wrong here; 'I don't know if you've ever written an
> undo system that can do, undo, or redo any command you can think of, in
> less than 40 lines of code, but there one is."

It sounds correct to me; however it probably would be slightly better to
put "in less than 40 lines of code" in parentheses or dashes rather than
comma clauses.
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: [review] [text] Text review begins June 11

Boost - Users mailing list
In reply to this post by Boost - Users mailing list
On Wed, Jun 3, 2020 at 1:22 AM Emil Dotchevski via Boost-users
<[hidden email]> wrote:
>
> On Tue, Jun 2, 2020 at 7:44 PM Glen Fernandes via Boost-users <[hidden email]> wrote:
> >
> > The Boost formal review of Zach Laine's Unicode library, Text,
> > commences on June 11th and concludes June 20th. We welcome the
> > participation of Boost developers and users, and the C++ community.
>
> Here is my review of Text in the form of a list of notes/questions, in no particular order.

Thanks for the review, Emil.

> Reading through the documentation, everything makes sense. The motivation for each type and operation is easy to understand.
>
> I welcome the use of signed int as index/size type, but I wonder if this would interact in some negative way when mixed with similar types from the standard library. I don't think so, but who knows.

In some cases, that is sure to be the case.  text::string needs to
exist only so that I can experiment with integrating something like
text::text into it, to show how we might add to std::string if
std::text ever becomes a thing.  So, I don't expect the
signed/unsigned interop issue to come up much, if at all -- I don't
expect people to use text::string much directly.

> I don't like the multi-page format of the documentation. I'd prefer a single HTML page for easy searching.

There's a lot of it!  In previous Boost reviews, I've gotten the
opposite of this feedback, FWIW.

> What is the purpose of the throw_on_encoding_error type here? https://github.com/tzlaine/text/blob/b8bb1a1101e7cf53b460671a3ead19516e7636fe/include/boost/text/transcode_iterator.hpp#L35
>
> I searched and couldn't find any place in the documentation that explains how ErrorHandler is used. Looking at the source code, it's just a function, but its interface should be documented.

Right!  That's an oversight.  I'll add documentation on its use and interface.

> Generally I dislike interfaces with configurable error handling but it appears that in the case of the conversion iterators this is warranted: both replacement character and throwing are useful. I wonder though if it would be better to delete this template parameter and always output 0xfffd, and in addition provide an iterator adaptor type which can be used to wrap e.g. an output iterator, and which throws if it sees 0xfffd being output.

This is possible, and if others agree I can certainly make this change.

> Both .c_str() and .data() should be added to string for compatibility. Also, it is not true that s.begin() is equivalent to s.c_str(), because in debug iterators should not be pointers. Perhaps &*s.begin() is equivalent, but this should not be valid on an empty string (assert), null termination notwithstanding.

Well, I don't have any non-pointer debug iterators to consider, and an
empty text::string is null-terminated.  .c_str() .data() and .begin()
truly are synonyms in this case.  I see no need to have more than one.

> The note about passing ropes by value, I think, is redundant. Logically, copying a rope gets you an independent copy, the fact that internally the copies share state is an implementation detail. Therefore, the documentation can just specify that a rope is as thread-safe as an int.

I don't know about that.  That is, I don't know if it's actually as
threadsafe as an int.  I'm simply trying to remind folks that if you
pass a rope via const& you're not going to get the thread-safety you
want.  I've seen a lot of shared_ptrs passed that way, and I'm trying
to short-circuit the internal decision making that might lead to the
same choice when passing a rope.

> Speaking of thread-safety, where are the thread-safety guarantees of rope documented? Where are the relevant tests?

The documentation of threadsafety is here:

https://tzlaine.github.io/text/doc/html/boost_text__proposed_/the_string_layer.html#boost_text__proposed_.the_string_layer.the__code__phrase_role__identifier__unencoded_rope__phrase___code__type

You have to read down a bit.  The tests are in rope_threadsafety.cpp,
which I have run with and without TSan.

> Speaking of tests, I suggest separating the generated test sources in a subdirectory so things like test/string.cpp are easy to find. Not to mention github.com truncates the test directory to 1000 files.

This seems like a good idea -- it would certainly make things easier
for me too!  I'll do this.

> The grammar seems wrong here; 'I don't know if you've ever written an undo system that can do, undo, or redo any command you can think of, in less than 40 lines of code, but there one is."

It parses for me, and apparently Gavin, but I agree it is a bit
awkward.  I'll reword it.

> "Every text editor you've ever used that handles large files gracefully has done something to break the file into chunks" What about the buffer gap algorithm? I think Borland editors used this way back in the day. :)

I have no idea.  I *think* this confirms what I was trying to say,
though -- that your editor is dealing with sections of your large file
(and not the whole thing) for efficiency.  The rope approach I use in
Boost.Text is just one possible implementation of this high-level
approach.

Zach
_______________________________________________
Boost-users mailing list
[hidden email]
https://lists.boost.org/mailman/listinfo.cgi/boost-users