Fixed String review

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Fixed String review

Boost - Dev mailing list
Dear All,

Here is my review of the proposed Fixed String library.

In summary I think this would be a useful thing to have in Boost,
but there are a few issues with the code as it currently stands.

This is based on studying the code and docs and the previous
mailing list discussions in September; I haven't actually tried
to use the library.

Firstly regarding the name, I would much prefer static_string
for consistency with boost::container::static_vector and with
P0843.  Note that static_vector was the preferred name in
an LEWG poll, according to P0843.  It's true that "static"
is an ambiguous word that does not really convey what is
different about this string, but then neither does "fixed"
in my opinion; "fixed capacity" would, but just "fixed" sounds
more like "immutable" to me.

On the subject of P0843, it it worth reading as it contains
some useful links and rationale:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0843r2.html
Also a more recent r3 update and reference implementation:
https://github.com/gnzlbg/static_vector

Here are the features that I think are missing, in no particular
order; apologies if some of this does exist but I've missed it:

- Much more of it should be constexpr.  See P0980R1.  That mentions
that some of char_traits needs to be constexpr as a prerequisite,
but even without that things like size() and empty() can be
constexpr.  There is also a question of whether all of the
elements must be initialised (to 0) by the ctor to be constexpr,
which could be costly for a large but sparse container, though it
does mean that subsequent push_back()s don't need to store
terminators; I don't understand all the issues involved here.
(I note a comment in the docs suggesting the lack of constexpr may
be temporary.)

- In the same vein, more of it should be noexcept - why aren't
empty() and size() ?

- You can add a specialisation for size 0 that turns it into an
empty struct.

- You can add a string literal operator"".

- You can add std::hash overloads.

There are various possibilities for what should happen if the
capacity is exceeded, including throwing an exception, asserting,
and allowing undefined behaviour.  I'm not sure what I would
really prefer though I think I'd be more likely to assert() than
to throw.  See P0843r2 again which discusses this.

Regarding your deviations from std::string:
operator+ has been omitted because you consider it's not possible
to determine a reasonable return type; isn't it just size N+M ?
Do you reject that just because it could end up rather large after
a sequence of additions?  I'd agree if it were N*M but not really for
N+M.  Have I missed something?
You've chosen to return a string_view from substr().  I think that's
a bit dangerous; code that does "auto a = s.substr(...)" could end
up with a dangling view when s is a fixed string; this makes it
more difficult than it should be to migrate from one string type
to another, or to write generic code.

Regarding the docs, it would be great if the reference could have
some highlighting (e.g. bright red?) for the few places where it
differs from std::string.

My own use for this sort of string has often been to store very
large numbers of small strings when total memory usage has been a
concern.  (A good example would be an textual ID code of some sort,
like a postal code or an airport code etc.)  For this case, the
8-byte overhead for the size_t length field is a concern; I'd like
a static_string<6> to need only 8 bytes total.  I suggested changing
this to the minimum size needed to store values 0 to N in the
previous email discussion, but that hasn't been implemented.

My initial thought was that static_string should be implemented
on top of boost::container::static_vector<char>.  My rationale
for that was that static_vector is well-established and doesn't
need to be re-implemented, and that a string_facade that
adapts a vector<char>-like interface into a string-like interface
would be a useful component in its own right.  (I've tried to
write something like this myself, but it gets tedious very quickly
writing out all of those special string methods.)  But then I
discovered that actually boost::container::static_vector is
not implemented as a simple size,data[] pair; apparently it
contains pointers to its own storage.  So it isn't even trivially-
copyable.  Now that I know that, I wonder if what we really need
is a low-storage-overhead static_vector "done properly" (like
the P0843 reference implementation), with a string_facade to
make static_string from static_vector<char>.

On balance, my view is that this proposal is promising but not
quite ready yet.  I suggest that the authors should be asked to
come back in a few months with a revised version.  My preferred
design would be an improved static_vector and a string_facade
that converts it into a static_string, but if they don't want
to do that then simply tidying up the constexpr, noexcept and
hash issues would be minimally sufficient.


Regards, Phil.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
<[hidden email]> wrote:
> I would much prefer static_string

Same here... I was foolish to rename it.

> - Much more of it should be constexpr.

I agree that it could be constexpr (at least some of it) but this can
be added as an improvement later.

Taking the opposite position, why should it be constexpr? The purpose
of this container is to serve as a performant substitute for
std::string when the maximum size is known ahead of time or can be
reasonably estimated. In which use-cases do you anticipate the need
for a constexpr static_string? It would certainly waste space, unless
the caller sizes it perfectly. But then what's the difference between
that and a constexpr string literal or string view?

In my opinion people are too quick to say "constexpr it" without
evaluating the necessity. I'm not saying we shouldn't do it (but see
below), however lets not pretend that there are legions of users eager
for a constexpr fixed capacity mutable string.

> There is also a question of whether all of the
> elements must be initialised (to 0) by the ctor to be constexpr,
> which could be costly for a large but sparse container, though

I think initializing everything to 0 is a non-starter for the
non-constexpr case, due to the performance penalty. And initializing
to 0 only for constexpr requires new language features I believe?
std::is_constexpr_evaluated or some such.

> it does mean that subsequent push_back()s don't need to store
> terminators;

It would still need to null-terminate, for the case where characters
are removed from the string and subsequently reinserted.

> - You can add a specialisation for size 0 that turns it into an
> empty struct.
>
> - You can add a string literal operator"".
>
> - You can add std::hash overloads.

I'm not seeing the value in these features. What's the use case? No
one will be using static_string as the key for a container, since it
wastes space. Why do we need a string literal operator, to produce a
string constant that takes up more space than necessary? And why are
we "optimizing" the case where size==0? Is this something that anyone
will notice or care about (aside from WG21 of course, which often
concerns itself with not-relevant things).

> There are various possibilities for what should happen if the
> capacity is exceeded, including throwing an exception, asserting,
> and allowing undefined behaviour.  I'm not sure what I would
> really prefer though I think I'd be more likely to assert() than
> to throw.  See P0843r2 again which discusses this.

Quoting some nonsense from p0843r2:

    1. Throw an exception.
    2. Make this a precondition violation.
    ...
   It could throw either std::out_of_bounds or std::logic_error but in any
    case the interface does not end up being equal to that of std::vector.
    ...
  The alternative is to make not exceeding the capacity a precondition on
  the static_vector's methods.

So basically Gonzalo (the paper's author) says that throwing an
exception of the wrong type makes the interface different from
std::vector, thus the solution is to make the interface even more
different from vector by requiring callers to check sizes. Gonzalo
lists existing practice of Boost.Container's static_vector, with
throws on overflow, as existing practice. Only in the bizarro world of
WG21 could someone write a paper which quotes existing practice, then
proceed to deviate from existing practice for illogical reasons.

We can return to sanity by recognizing what the purpose of static_string is for:

    "...a performant substitute for std::string when the maximum size is known
    ahead of time or can be reasonably estimated."

Throwing an exception is the obvious rational choice. It lets
static_string be mostly a drop-in replacement, without requiring call
sites to add extensive code to check for preconditions. It guarantees
that any successful modifications to the static_string will be exact
(no truncation). And most importantly it matches existing practice
(e.g. boost::container::static_vector and others like it).

> Regarding your deviations from std::string:
> operator+ has been omitted because you consider it's not possible
> to determine a reasonable return type; isn't it just size N+M ?
> Do you reject that just because it could end up rather large after
> a sequence of additions?  I'd agree if it were N*M but not really for
> N+M.  Have I missed something?

Yes. It is too easy for someone to change existing code that does
a+b+c+d where the operands are strings. Users could the get surprising
behavior of tens or hundreds of kilobytes of stack consumption. There
are other ways to implement operator+ but we prefer to the omit
interfaces that can have surprising behavior, or that have non-obvious
results (such as truncating).

> You've chosen to return a string_view from substr().  I think that's
> a bit dangerous; code that does "auto a = s.substr(...)" could end
> up with a dangling view when s is a fixed string; this makes it
> more difficult than it should be to migrate from one string type
> to another, or to write generic code.

Again we have to refer to the purpose of the container. People are
using this for performance, the very last thing they want from
substr() is to receive a copy. Users can opt-in to making a copy if
they want, by constructing a new static_string from the string view
returned by substr. If we go with your suggestion, no one can opt out
of copies.

> My initial thought was that static_string should be implemented
> on top of boost::container::static_vector<char>. My rationale...

No normal user is going at static_string and think "I would use this
if it required Boost.Container." Fewer dependencies is better. And,
this library is designed to work without the rest of Boost (the
"standalone" mode). I want people to be able to do "vcpkg install
static_string" and then be ready to boogie. They will still have to
type `boost::` to access it.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
Vinnie Falco wrote:

> In which use-cases do you anticipate the need for a constexpr
> static_string?

When manipulating strings at compile time? Such as s1 += s2?


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 27/11/2019 06:33, Vinnie Falco wrote:
>> I would much prefer static_string
>
> Same here... I was foolish to rename it.

Neither one is a great choice, as both "static" and "fixed" can be
synonyms of "immutable", which this is not.

"fixed_capacity_string" is the most accurate name -- while you might
think that's too long, I think it's better to be properly descriptive
and allow consumers to typedef it to something easier to type if they
wish to.

>> Regarding your deviations from std::string:
>> operator+ has been omitted because you consider it's not possible
>> to determine a reasonable return type; isn't it just size N+M ?
>> Do you reject that just because it could end up rather large after
>> a sequence of additions?  I'd agree if it were N*M but not really for
>> N+M.  Have I missed something?
>
> Yes. It is too easy for someone to change existing code that does
> a+b+c+d where the operands are strings. Users could the get surprising
> behavior of tens or hundreds of kilobytes of stack consumption. There
> are other ways to implement operator+ but we prefer to the omit
> interfaces that can have surprising behavior, or that have non-obvious
> results (such as truncating).

The most logical capacity for the result of operator+ would probably be
std::max(a.capacity(), b.capacity()).  Bear in mind that we are assuming
that both actual string values are significantly shorter than their
capacities throughout most of their lifetime.

(This is a reasonably safe assumption especially when you *don't* have
an operator"" that produces exactly-right-sized strings.)

In an ideal world, you'd be able to factor in the current lengths, but
of course that isn't available at compile time so can't be used to
determine the type.


Having said that, operator+ is still going to cause an allocation, so it
does still make sense to omit it from the interface for that reason, so
that people think about it more (usually by switching to zero-allocation
append or +=).

>> You've chosen to return a string_view from substr().  I think that's
>> a bit dangerous; code that does "auto a = s.substr(...)" could end
>> up with a dangling view when s is a fixed string; this makes it
>> more difficult than it should be to migrate from one string type
>> to another, or to write generic code.
>
> Again we have to refer to the purpose of the container. People are
> using this for performance, the very last thing they want from
> substr() is to receive a copy. Users can opt-in to making a copy if
> they want, by constructing a new static_string from the string view
> returned by substr. If we go with your suggestion, no one can opt out
> of copies.

Perhaps as a compromise (due to the difference from std::string
interface anyway) this method should be named substr_view?  This would
then alert someone converting from one to the other to look at each call
site and verify that it isn't going to produce a dangling view.

>> My initial thought was that static_string should be implemented
>> on top of boost::container::static_vector<char>. My rationale...
>
> No normal user is going at static_string and think "I would use this
> if it required Boost.Container." Fewer dependencies is better. And,
> this library is designed to work without the rest of Boost (the
> "standalone" mode). I want people to be able to do "vcpkg install
> static_string" and then be ready to boogie. They will still have to
> type `boost::` to access it.

That just means that standalone mode would need to depend on a
standalone implementation of static_vector, which vcpkg would pull in as
well.  Or you should give up on a standalone mode and have your vcpkg
depend on Boost.

Dependencies can be annoying, but constantly reinventing existing
vocabulary types is also annoying.  Waiting until they get adopted by
the standard before using them in other libraries is a bit silly.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 27/11/2019 06:46, Peter Dimov wrote:
> Vinnie Falco wrote:
>
>> In which use-cases do you anticipate the need for a constexpr
>> static_string?
>
> When manipulating strings at compile time? Such as s1 += s2?

Wouldn't that use case be better served with an immutable string and
operator+?  At compile time, you can maintain the invariant length() ==
capacity(), which this type cannot do (and is not intended for).

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
Gavin Lambert wrote:

> On 27/11/2019 06:46, Peter Dimov wrote:
> > Vinnie Falco wrote:
> >
> >> In which use-cases do you anticipate the need for a constexpr
> >> static_string?
> >
> > When manipulating strings at compile time? Such as s1 += s2?
>
> Wouldn't that use case be better served with an immutable string and
> operator+?

Not in general.

constexpr string_view f( int x );

constexpr auto g( int n )
{
    fixed_string<512> s;

    for( int i = 0; i < n; ++i )
    {
        s += f( i );
    }

    return s;
}

You can rewrite this specific example using recursion but in general, it'd
be painful and unreadable.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
>   In the same vein, more of it should be noexcept - why aren't
empty() and size() ?

empty() and size() not being noexcept is more of an oversight on our part.
As for why more of the member functions aren't noexcept, they have the same
exception specifications as those of their counterparts in std::string.
That being said, although they do call "dangerous" operations like
Traits::length that are not noexcept, such functions don't throw (at least
for std::char_traits), so its difficult to specify as noexcept since Traits
could also be a user defined type.

>  You can add a specialisation for size 0 that turns it into an empty
struct.

Certainly, it would be fairly trivial to do so.

> I suggested changing this to the minimum size needed to store values 0 to
N in the previous email discussion, but that hasn't been implemented.

Sure, that's extremely trivial to do, and seems like a great idea.

On Tue, Nov 26, 2019 at 7:25 PM Peter Dimov via Boost <[hidden email]>
wrote:

> Gavin Lambert wrote:
> > On 27/11/2019 06:46, Peter Dimov wrote:
> > > Vinnie Falco wrote:
> > >
> > >> In which use-cases do you anticipate the need for a constexpr
> > >> static_string?
> > >
> > > When manipulating strings at compile time? Such as s1 += s2?
> >
> > Wouldn't that use case be better served with an immutable string and
> > operator+?
>
> Not in general.
>
> constexpr string_view f( int x );
>
> constexpr auto g( int n )
> {
>     fixed_string<512> s;
>
>     for( int i = 0; i < n; ++i )
>     {
>         s += f( i );
>     }
>
>     return s;
> }
>
> You can rewrite this specific example using recursion but in general, it'd
> be painful and unreadable.
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vinnie Falco wrote:
> On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
> <[hidden email]> wrote:

>> - Much more of it should be constexpr.
> Taking the opposite position, why should it be constexpr?

I think best practice is to declare everything constexpr unless
it can't be.  It's a bit like const was 20 years ago - does anyone
else remember dealing with old code that didn't declare anything
const?

>> operator+ has been omitted because you consider it's not possible
>> to determine a reasonable return type; isn't it just size N+M ?
>> Do you reject that just because it could end up rather large after
>> a sequence of additions?  I'd agree if it were N*M but not really for
>> N+M.  Have I missed something?
>
> Yes. It is too easy for someone to change existing code that does
> a+b+c+d where the operands are strings. Users could the get surprising
> behavior of tens or hundreds of kilobytes of stack consumption. There
> are other ways to implement operator+ but we prefer to the omit
> interfaces that can have surprising behavior, or that have non-obvious
> results (such as truncating).

Your application seems to be for largish buffers; mine is for
smallish strings.  For the smallest strings a static_string
can be smaller than a std::string, and for slightly larger
strings the space overhead if the actual string is shorter
than the capacity may be an acceptable trade-off for the
improved performance.  For example:

struct NameAndAddress {
  static_string<14> firstname;
  static_string<14> surname;
  static_string<30> address_line_1;
  static_string<22> city;
  static_string<10> postcode;
};

(Yes, yes, you probably wouldn't want to make those fields fixed-size
in real code, but it's an example we can all understand.)  That struct
is trivially-copyable.  Copying a version that used std::string would
involve multiple memory allocations and could be an order of magnitude
slower.  Now:

std::string a = s.firstname + " " + s.surname + ",\n"
              + s.address_line_1 + ",\n"
              + s.city + ",\n"
              + s.postal_code;

That will create a temporary static_string<97> and copy from that
into the std::string.  But it needs a series of temporaries and probably
uses a few hundred bytes of stack.  Yes this isn't as good as we could
get from a concat(...) function (or expression templates) that computed
the size of the required buffer up-front and copied all of the arguments
once, but it's still going to be an improvement over using std::string's
operator+.

>> - You can add std::hash overloads.

> I'm not seeing the value in these features. What's the use case? No
> one will be using static_string as the key for a container, since it
> wastes space.

using customer_code = static_string<11>;
std::unordered_map<customer_code,NameAndAddress> customers;

Vinnie, are you saying that this (i.e. small strings that are trivially-
copyable) is not a supported use for the library?  If you are, please say
so and we can just reject it!


>> You've chosen to return a string_view from substr().  I think that's
>> a bit dangerous; code that does "auto a = s.substr(...)" could end
>> up with a dangling view when s is a fixed string; this makes it
>> more difficult than it should be to migrate from one string type
>> to another, or to write generic code.
>
> Again we have to refer to the purpose of the container.

Well again I'll contend that your purpose for the container is not the
only one.

> People are
> using this for performance, the very last thing they want from
> substr() is to receive a copy. Users can opt-in to making a copy if
> they want, by constructing a new static_string from the string view
> returned by substr. If we go with your suggestion, no one can opt out
> of copies.

If you want to get a string_view of a substring of a std::string, I think
you can write something like

std::string s = .....;
std::string_view sv(s);  // view of all of s
std::string_view sub_sv = sv.substr(pos,count);  // view of a subview of s

Or in one line:

auto sub_sv = std::string_view(s).substr(pos,count);

Maybe std::string should have a method to do that more succinctly?

Anyway, someone wanting to get a string_view of a substring of a
fixed_string can do exactly the same thing.


>> My initial thought was that static_string should be implemented
>> on top of boost::container::static_vector<char>. My rationale...
>
> No normal user is going at static_string and think "I would use this
> if it required Boost.Container."

Vinnie, I have literally just said that I'd prefer this to be an
improved static_vector and then an adaptor that adds all the string
methods on top of static_vector<char>.  I think that would be better
because it gives us three useful things "for the price of one", i.e.
a better static_vector, a static_string, and a "stringification"
adaptor that can be used independently.  So you're calling me
"not normal" for suggesting that.  I think that's rude.  You've
also been rude to Gonzalo Brito Gadeschi and the whole of WG21 in
parts of your message that I've not quoted.  Please don't do that.

Unfortunately I don't have the time to respond to everything else.


Regards, Phil.



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
On Wed, Nov 27, 2019 at 8:36 AM Phil Endecott via Boost
<[hidden email]> wrote:
> struct NameAndAddress {
>   static_string<14> firstname;
>   static_string<14> surname;
>   static_string<30> address_line_1;
>   static_string<22> city;
>   static_string<10> postcode;
> };

Ahh...I can't argue with this.  `sizeof(std::string)` is

* 32 bytes on gcc 8.3.0
* 24 bytes on clang 9.0.0

This is a pretty compelling argument, I am convinced. Therefore, this
library should provide specializations for optimizing small strings.
This could include storing the current size as (C-N) in the last
character position where C is the capacity and N is the size.

> (Yes, yes, you probably wouldn't want to make those fields fixed-size
> in real code, but it's an example we can all understand.)

Actually.... small fixed size fields come up in database applications
all the time. And with sizeof(std::string) being 32 bytes on clang, a
specialization will allow a generous 31 characters which is actually
quite nice.

> ...it's still going to be an improvement over using std::string's
> operator+.

Okay, I agree that we need to do _something_. I'm not sure exactly
what that something is. I'm hesitant to endorse operator+ because of
the unpredictable behavior. How about a free function `concat(...)`
which allows the caller to optionally specify the maximum capacity of
the resulting string,

> Vinnie, are you saying that this (i.e. small strings that are trivially-
> copyable) is not a supported use for the library?'

\sigh. Yes, it needs hash and probably the other stuff.

> auto sub_sv = std::string_view(s).substr(pos,count);

I can support this if there is consensus that substr() should return a
fixed capacity string instead of a view.

> Maybe std::string should have a method to do that more succinctly?

Umm... I think std::string has enough members :) it needs a rest.

> >> My initial thought was that static_string should be implemented
> >> on top of boost::container::static_vector<char>. My rationale...
> > ...
>
> Vinnie, I have literally just said that I'd prefer this to be an
> improved static_vector

You said it should use boost::container::static_vector. That's a
non-starter for me. A goal for the library is to not use any external
dependencies. I am also not a fan of Boost.Container's idiom of
Options template parameters. It confuses tooling and it is hard to
teach. I don't mind it in Boost.Intrusive because intrusive containers
are already a tool for experts. But I'd like the fixed capacity
strings to be more accessible. There are also string-specific
optimizations which wouldn't work if the implementation was tied to
static_vector. No I don't think that coupling the two is a good idea
at all.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
On Wed, Nov 27, 2019 at 11:23 AM Vinnie Falco via Boost <
[hidden email]> wrote:

> On Wed, Nov 27, 2019 at 8:36 AM Phil Endecott via Boost
> > ...it's still going to be an improvement over using std::string's
> > operator+.
>
> Okay, I agree that we need to do _something_. I'm not sure exactly
> what that something is. I'm hesitant to endorse operator+ because of
> the unpredictable behavior. How about a free function `concat(...)`
> which allows the caller to optionally specify the maximum capacity of
> the resulting string,
>

I was thinking the same thing.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Em qua., 27 de nov. de 2019 às 13:36, Phil Endecott via Boost <
[hidden email]> escreveu:

> struct NameAndAddress {
>   static_string<14> firstname;
>   static_string<14> surname;
>   static_string<30> address_line_1;
>   static_string<22> city;
>   static_string<10> postcode;
> };
>

I was going to write a review, but I think I can borrow your example to
show the single point I'm concerned about.

char[N] has alignment of char. std::size_t has not and structs like this
will consume much more memory because this single member:
https://github.com/18/fixed_string/blob/9730545e0fa37b43299559fca8253c434c67a491/include/boost/fixed_string/fixed_string.hpp#L51

I'm concerned about this alignment and I'd rather have the size being
computed by searching for the NULL-terminator (or also stored in a byte).
Maybe a policy template parameter could be used to select the
implementation. I don't really care about which solution is chosen as long
as I can opt-in for a struct which has the same alignment as char (and it
is trivially copyable).


--
Vinícius dos Santos Oliveira
https://vinipsmaker.github.io/

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
On Wed, Nov 27, 2019, 10:17 AM Vinícius dos Santos Oliveira via Boost <
[hidden email]> wrote:

> I'm concerned about this alignment and I'd rather have the size being
> computed by searching for the NULL-terminator (or also stored in a byte).
>

I agree with char alignment, and we have techniques for doing this
transparently in constant time, without policies, which will be impmemented.

>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 28/11/2019 06:23, Vinnie Falco wrote:
> On Wed, Nov 27, 2019 at 8:36 AM Phil Endecott wrote:
>> auto sub_sv = std::string_view(s).substr(pos,count);
>
> I can support this if there is consensus that substr() should return a
> fixed capacity string instead of a view.

To clarify, I think the argument was that one can already obtain a
string_view substr via the above, therefore your own member substr
should probably just be removed (as it deviates from std::string
semantics anyway).

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Nov 26, 2019 at 11:34 AM Vinnie Falco via Boost <
[hidden email]> wrote:

> > - You can add std::hash overloads.
>
> I'm not seeing the value in these features. What's the use case? No
> one will be using static_string as the key for a container, since it
> wastes space.


I've used fixed sized strings not only because they are space efficient,
but also because they are non-allocating.  And because those types were
being used in other places, it was also useful to have them as keys in hash
maps.


> Why do we need a string literal operator, to produce a
> string constant that takes up more space than necessary? And why are
> we "optimizing" the case where size==0? Is this something that anyone
> will notice or care about (aside from WG21 of course, which often
> concerns itself with not-relevant things).


I'm sure people said the same thing about array<T, 0>, yet it turns out to
be useful in generic code.  As for people noticing, they do tend to notice
sub-optimal implementations.

 Given that "'optimizing' the case where size==0" is about implementation
and not specification, this looks like something that WG21 would:

   - consider not-relevant
   - not concern itself with it

Could you elaborate on why you think WG21 would be concerned with it,
should it someday make it into a proposal?

The more interesting argument is whether or not to include reserve() and
shrink_to_fit().  If it is for generic replacement of std::string, then
reserve() should throw when n > size() (it is currently underspecified).
Since it does no allocation I'd leave them out.

--
 Nevin ":-)" Liber  <mailto:[hidden email] <[hidden email]>>
+1-847-691-1404

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vin?cius dos Santos Oliveira <[hidden email]> wrote:

> Em qua., 27 de nov. de 2019 ?s 13:36, Phil Endecott via Boost <
> [hidden email]> escreveu:
>
>> struct NameAndAddress {
>>   static_string<14> firstname;
>>   static_string<14> surname;
>>   static_string<30> address_line_1;
>>   static_string<22> city;
>>   static_string<10> postcode;
>> };
>>
>
> I was going to write a review, but I think I can borrow your example to
> show the single point I'm concerned about.
>
> char[N] has alignment of char. std::size_t has not and structs like this
> will consume much more memory because this single member

This is avoided if you use an appropriate smaller type for
the size, i.e. uint8_t when capacity <= 255.

> I'd rather have the size being
> computed by searching for the NULL-terminator

Yes a string that does this can also be useful, particularly for the
very smallest strings, but I would not suggest trying to combine it
with this implementation.


Regards, Phil.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vinnie Falco wrote:
> How about a free function `concat(...)`
> which allows the caller to optionally specify the maximum capacity of
> the resulting string

See also http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1228r1.html
IMO, that paper unnecessarily combines string concatenation
with number-formatting, but it is still interesting to look at.
It converts all of its arguments to string_views, sums the
lengths (at run time), allocates a std::string of that length,
and copies the arguments into it.  You could extend that to
(a) permit the return type to be specified as an additional
template parameter, and (b) default that return type to a
static_string if the capacities of all the arguments are finite.


Regards, Phil.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vinnie Falco wrote:
> Okay, I agree that we need to do _something_. I'm not sure exactly what
> that something is. I'm hesitant to endorse operator+ because of the
> unpredictable behavior. How about a free function `concat(...)` which
> allows the caller to optionally specify the maximum capacity of the
> resulting string,

The function that allows you to specify the capacity is spelled

    fixed_string<N> result = s1 + s2;

This is not particularly efficient on paper, it's one extra memcpy, but
that's unavoidable when using op+.

And, while I agree that op+ giving 1024 for two 512 inputs isn't very
useful, op+ for inputs 12 and 8 giving 20 seems not that useless.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
wt., 26 lis 2019 o 18:34 Vinnie Falco via Boost <[hidden email]>
napisał(a):

> On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
> <[hidden email]> wrote:
> > I would much prefer static_string
>
> Same here... I was foolish to rename it.
>
> > - Much more of it should be constexpr.
>
> I agree that it could be constexpr (at least some of it) but this can
> be added as an improvement later.
>
> Taking the opposite position, why should it be constexpr? The purpose
> of this container is to serve as a performant substitute for
> std::string when the maximum size is known ahead of time or can be
> reasonably estimated. In which use-cases do you anticipate the need
> for a constexpr static_string? It would certainly waste space, unless
> the caller sizes it perfectly. But then what's the difference between
> that and a constexpr string literal or string view?
>
> In my opinion people are too quick to say "constexpr it" without
> evaluating the necessity. I'm not saying we shouldn't do it (but see
> below), however lets not pretend that there are legions of users eager
> for a constexpr fixed capacity mutable string.
>
> > There is also a question of whether all of the
> > elements must be initialised (to 0) by the ctor to be constexpr,
> > which could be costly for a large but sparse container, though
>
> I think initializing everything to 0 is a non-starter for the
> non-constexpr case, due to the performance penalty. And initializing
> to 0 only for constexpr requires new language features I believe?
> std::is_constexpr_evaluated or some such.
>
> > it does mean that subsequent push_back()s don't need to store
> > terminators;
>
> It would still need to null-terminate, for the case where characters
> are removed from the string and subsequently reinserted.
>
> > - You can add a specialisation for size 0 that turns it into an
> > empty struct.
> >
> > - You can add a string literal operator"".
> >
> > - You can add std::hash overloads.
>
> I'm not seeing the value in these features. What's the use case? No
> one will be using static_string as the key for a container, since it
> wastes space. Why do we need a string literal operator, to produce a
> string constant that takes up more space than necessary? And why are
> we "optimizing" the case where size==0? Is this something that anyone
> will notice or care about (aside from WG21 of course, which often
> concerns itself with not-relevant things).
>
> > There are various possibilities for what should happen if the
> > capacity is exceeded, including throwing an exception, asserting,
> > and allowing undefined behaviour.  I'm not sure what I would
> > really prefer though I think I'd be more likely to assert() than
> > to throw.  See P0843r2 again which discusses this.
>
> Quoting some nonsense from p0843r2:
>
>     1. Throw an exception.
>     2. Make this a precondition violation.
>     ...
>    It could throw either std::out_of_bounds or std::logic_error but in any
>     case the interface does not end up being equal to that of std::vector.
>     ...
>   The alternative is to make not exceeding the capacity a precondition on
>   the static_vector's methods.
>
> So basically Gonzalo (the paper's author) says that throwing an
> exception of the wrong type makes the interface different from
> std::vector, thus the solution is to make the interface even more
> different from vector by requiring callers to check sizes. Gonzalo
> lists existing practice of Boost.Container's static_vector, with
> throws on overflow, as existing practice. Only in the bizarro world of
> WG21 could someone write a paper which quotes existing practice, then
> proceed to deviate from existing practice for illogical reasons.
>
> We can return to sanity by recognizing what the purpose of static_string
> is for:
>
>     "...a performant substitute for std::string when the maximum size is
> known
>     ahead of time or can be reasonably estimated."
>
> Throwing an exception is the obvious rational choice. It lets
> static_string be mostly a drop-in replacement, without requiring call
> sites to add extensive code to check for preconditions. It guarantees
> that any successful modifications to the static_string will be exact
> (no truncation). And most importantly it matches existing practice
> (e.g. boost::container::static_vector and others like it).
>

Actually, boost::container::static_vector defines the container beyond its
capacity as a precondition:
https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/static_vector.html#idm45836565507248-bb

And this does not require of a user any extensive "checking" of
preconditions. Te user simply knows her program's flow, and already knows
the preconditions are not exceeded. For instance, the logic of the
application dictates that strings are never resized, or for every name
initially created only zero or one push_back()s are performed with a single
character.

Regards,
&rzej;


> > Regarding your deviations from std::string:
> > operator+ has been omitted because you consider it's not possible
> > to determine a reasonable return type; isn't it just size N+M ?
> > Do you reject that just because it could end up rather large after
> > a sequence of additions?  I'd agree if it were N*M but not really for
> > N+M.  Have I missed something?
>
> Yes. It is too easy for someone to change existing code that does
> a+b+c+d where the operands are strings. Users could the get surprising
> behavior of tens or hundreds of kilobytes of stack consumption. There
> are other ways to implement operator+ but we prefer to the omit
> interfaces that can have surprising behavior, or that have non-obvious
> results (such as truncating).
>
> > You've chosen to return a string_view from substr().  I think that's
> > a bit dangerous; code that does "auto a = s.substr(...)" could end
> > up with a dangling view when s is a fixed string; this makes it
> > more difficult than it should be to migrate from one string type
> > to another, or to write generic code.
>
> Again we have to refer to the purpose of the container. People are
> using this for performance, the very last thing they want from
> substr() is to receive a copy. Users can opt-in to making a copy if
> they want, by constructing a new static_string from the string view
> returned by substr. If we go with your suggestion, no one can opt out
> of copies.
>
> > My initial thought was that static_string should be implemented
> > on top of boost::container::static_vector<char>. My rationale...
>
> No normal user is going at static_string and think "I would use this
> if it required Boost.Container." Fewer dependencies is better. And,
> this library is designed to work without the rest of Boost (the
> "standalone" mode). I want people to be able to do "vcpkg install
> static_string" and then be ready to boogie. They will still have to
> type `boost::` to access it.
>
> Regards
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
wt., 26 lis 2019 o 18:34 Vinnie Falco via Boost <[hidden email]>
napisał(a):

> On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
> <[hidden email]> wrote:
>
> > You've chosen to return a string_view from substr().  I think that's
> > a bit dangerous; code that does "auto a = s.substr(...)" could end
> > up with a dangling view when s is a fixed string; this makes it
> > more difficult than it should be to migrate from one string type
> > to another, or to write generic code.
>
> Again we have to refer to the purpose of the container. People are
> using this for performance, the very last thing they want from
> substr() is to receive a copy. Users can opt-in to making a copy if
> they want, by constructing a new static_string from the string view
> returned by substr. If we go with your suggestion, no one can opt out
> of copies.
>

Even if this is a valid design choice when considered in separation, it is
clearly in conflict with the goal of being a drop-in replacement for
std::string. It looks like the library is aiming at being a drop-in
replacement for some subset of usages of std::string. It would be
beneficial to outline this subset clearly in the initial sections of the
documentation.

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Fixed String review

Boost - Dev mailing list
On Mon, Dec 2, 2019 at 3:07 AM Andrzej Krzemienski via Boost <
[hidden email]> wrote:

> wt., 26 lis 2019 o 18:34 Vinnie Falco via Boost <[hidden email]>
> napisał(a):
>
> > On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
> > <[hidden email]> wrote:
> >
> > > You've chosen to return a string_view from substr().  I think that's
> > > a bit dangerous; code that does "auto a = s.substr(...)" could end
> > > up with a dangling view when s is a fixed string; this makes it
> > > more difficult than it should be to migrate from one string type
> > > to another, or to write generic code.
> >
> > Again we have to refer to the purpose of the container. People are
> > using this for performance, the very last thing they want from
> > substr() is to receive a copy. Users can opt-in to making a copy if
> > they want, by constructing a new static_string from the string view
> > returned by substr. If we go with your suggestion, no one can opt out
> > of copies.
> >
>
> Even if this is a valid design choice when considered in separation, it is
> clearly in conflict with the goal of being a drop-in replacement for
> std::string. It looks like the library is aiming at being a drop-in
> replacement for some subset of usages of std::string. It would be
> beneficial to outline this subset clearly in the initial sections of the
> documentation.
>

+1.  FWIW, I find "this is a drop-in replacement" to be a typically
unattainable goal for anything that is not based on some kind of
polymorphism (compile- or runtime).  That is, the polymorphic drop-in
mechanism only cares about a subset of API and/or behavior.

Zach

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12