[review][fixed_string] Andrzej's feedback

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
Hi Everyone,
This can hardly be called a review, as I didn't look thoroughly at the
implementation. (I only checked what happens on the resize() over the
capacity(), as this was not clear from the docs.)

I think that ultimately the library would be a useful addition to Boost,
however I would not like to rush it before its design is fully baked, and
my impression is that it isn't yet. Therefore my recommendation would be to
REJECT the library at this point, and encourage the authors to resubmit it
for the second review once the design goals and design decisions have been
clearly settled and communicated.

My vision of a review is that we:
1. First evaluate the design goals, whether they are worth pursuing
2. Then we evaluate the design *against* those goals.
3. Then we evaluate the documentation and implementation *against* the
design and goals.

I cannot even go to #3, as to me #1 is missing some parts.

There are different possible goals:
1. Treat this new string as many other STL sequence containers with
variable size. In particular, users use it as if its max_size() were
infinite, and are prepared that at some point an exception might be thrown.
2. Treat it as a "drop-in replacement" for std::string with no-heap-alloc
guarantee.
3. Make something suitable for embedded systems where the user provides the
guarantee: "I will never make the size() of this string go above this
value. If I did, it would be plain clear that the program is not doing what
it was supposed to."

These goal 3 is in clear conflict with either 1 or 2. This becomes clear
when we consider what should be the response to doing a resize() over
capacity(): a precondition violation or a normal course of action signaled
by an exception.

The authors should make a choice which of these goals is not supported and
communicate it clearly from the outset in the docs. I think this decision
has already been made: the library was extracted from Boost.Beast and there
the purpose must have been clear. It is just that it is not communicated in
fixed_string library.

Any decision is good. In C++ performance trade-offs are part of the
contract (interface), so there is room in Boost for many strings: small
strings, short codes, fixed size strings, fixed capacity strings. But for
each library it should be clear what trade-offs were chosen and why.

We have a situation that there exists static_vactor in Boost, which defines
the resize() over capacity() to be a precondition violation. And
fixed_string needs to provide some response to this fact: why did you
choose a different design?

I consider the choice of name to be part of the design: you cannot assign
the name before you have decided on the goals. "static_string" might be a
good name if you chose the same trade-offs as static_vector. But if you
make different trade-offs, it might be better to use a different name.

Making a resize() over capacity() a precondition violation is a *feature*
useful for bug detection and I do not consider it a valid argument that
"library will throw exceptions and if you never resize() over capacity()
you will never see exceptions or std::abort()". If this is a precondition,
then I expect of a library to put some BOOST_ASSERT() or
_builtin_unreachable() in those paths to enable better bug detection.

I find the goal "a drop-in replacement for std::string" invalid. You surely
mean only a drop in replacement for some subset of usages. I guess this
subset comes from the usage in Beast, and I would expect that it is clearly
outlined what this subset is. One of the operations on strings that I use
most often is operator+, so when I hear that this operation is not
implemented in a class that claims to be a drop-in replacement for string,
I see a contradiction. There might be valid reasons not to provide
operator+, but the documentation cannot state that this is a drop-in
replacement for std::string.

I think that only after the above issues have been sorted out can we have a
review of the implementation and the documentation.

One remark about documentation is that I was not able to find what
`resize()` over `capacity()` does. This is the most important information
that I was after. anything else is more less intuitively clear from the
analogy to std::string.

I hope that this feedback doesn't sound discouraging. I appreciate the work
Krystian and Vinnie have put into making this submission, and I would like
to see this library ultimately included in Boost.

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
On Wed, Dec 4, 2019 at 6:14 AM Andrzej Krzemienski via Boost
<[hidden email]> wrote:
> I hope that this feedback doesn't sound discouraging. I appreciate the work
> Krystian and Vinnie have put into making this submission, and I would like
> to see this library ultimately included in Boost.

Not discouraging at all, and I agree with everything.

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Wed, Dec 4, 2019 at 6:14 AM Andrzej Krzemienski via Boost <
[hidden email]> wrote:

> These goal 3 is in clear conflict with either 1 or 2. This becomes clear
> when we consider what should be the response to doing a resize() over
> capacity(): a precondition violation or a normal course of action signaled
> by an exception.
>

The severity of buffer overrun bugs should be considered in this case, as
well as the overhead of checking and calling boost::throw_exception. Under
-fno-exceptions, this is literally a single cmp instruction. I can't
imagine a C++ program where this would matter, but if one exists, I can't
imagine it would be using a universal fixed_string type. The correct design
is to check and call boost::throw_exception, rather than assert.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 5/12/2019 03:14, Andrzej Krzemienski wrote:
> Making a resize() over capacity() a precondition violation is a *feature*
> useful for bug detection and I do not consider it a valid argument that
> "library will throw exceptions and if you never resize() over capacity()
> you will never see exceptions or std::abort()". If this is a precondition,
> then I expect of a library to put some BOOST_ASSERT() or
> _builtin_unreachable() in those paths to enable better bug detection.

I see this argument a lot, and it confuses me.

Perhaps this is my Windows dev background talking (since the analysis
tools seem more lacking on Windows, despite having a better debugger),
but in my experience it is vastly easier to find a thrown exception than
to find "deliberate" UB (including asserts).  And vastly easier to log
that it unexpectedly occurred in production code in the field, so that
you can detect and fix it without a debugger attached to the process.

Asserts and unreachables both disappear in release builds, so the
process ends up continuing to run in some subtly corrupted way -- if
you're lucky it crashes soon after in an unrelated location that takes
you weeks to track down the true cause.  If you're unlucky, it runs
longer, and corrupted some customer data along the way.

Please enlighten me.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
Gavin Lambert wrote:

> Asserts and unreachables both disappear in release builds, ...

Unreachables don't just disappear, they take parts of the code with them.
Misguided ideas like "let's use __builtin_unreachable for the preconditions"
are exactly why I prefer the behavior on critical precondition violations
specified, rather than undefined, even though undefined might theoretically
be better.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
śr., 4 gru 2019 o 23:13 Emil Dotchevski via Boost <[hidden email]>
napisał(a):

> On Wed, Dec 4, 2019 at 6:14 AM Andrzej Krzemienski via Boost <
> [hidden email]> wrote:
>
> > These goal 3 is in clear conflict with either 1 or 2. This becomes clear
> > when we consider what should be the response to doing a resize() over
> > capacity(): a precondition violation or a normal course of action
> signaled
> > by an exception.
> >
>
> The severity of buffer overrun bugs should be considered in this case, as
> well as the overhead of checking and calling boost::throw_exception. Under
> -fno-exceptions, this is literally a single cmp instruction. I can't
> imagine a C++ program where this would matter, but if one exists, I can't
> imagine it would be using a universal fixed_string type. The correct design
> is to check and call boost::throw_exception, rather than assert.
>

I tend to agree; especially that the library originated from Beast where it
may be even facing the outside world directly. What I would expect of
documentation to state it clearly in the design goals section:
1. Overrunning capacity() is not a bug but a normal usage of the library.
In this case we guarantee that an exception is thrown, much like
std::string does.
2. This design decision is opposite of what `static_vector` chose, so do
not be misled by an apparent similarity.

Note that the motivation -- in the alternate design (which neither of us
wants for fixed_string) -- for having a precondition violation rather than
guaranteed throw is safety: not performance.

Regards,
&rzej;


> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
śr., 4 gru 2019 o 23:47 Gavin Lambert via Boost <[hidden email]>
napisał(a):

> On 5/12/2019 03:14, Andrzej Krzemienski wrote:
> > Making a resize() over capacity() a precondition violation is a *feature*
> > useful for bug detection and I do not consider it a valid argument that
> > "library will throw exceptions and if you never resize() over capacity()
> > you will never see exceptions or std::abort()". If this is a
> precondition,
> > then I expect of a library to put some BOOST_ASSERT() or
> > _builtin_unreachable() in those paths to enable better bug detection.
>
> I see this argument a lot, and it confuses me.
>
> Perhaps this is my Windows dev background talking (since the analysis
> tools seem more lacking on Windows, despite having a better debugger),
> but in my experience it is vastly easier to find a thrown exception than
> to find "deliberate" UB (including asserts).  And vastly easier to log
> that it unexpectedly occurred in production code in the field, so that
> you can detect and fix it without a debugger attached to the process.
>
> Asserts and unreachables both disappear in release builds, so the
> process ends up continuing to run in some subtly corrupted way -- if
> you're lucky it crashes soon after in an unrelated location that takes
> you weeks to track down the true cause.  If you're unlucky, it runs
> longer, and corrupted some customer data along the way.
>
> Please enlighten me.
>

If the library design were to treat overrunning the capacity as a
programmer bug (which I am not advocating for for fixed_string) then I
imagine the library would use a dedicated macro to indicate the
precondition:

```
void fixed_string<N>::resize(size_type s)
{
  BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity());
  // then do the job
}
```

And you leave the decision what this macro expands to to the user. It will
be a bunch of #if defines. Te library would choose a safe default, like
checking the condition and calling `std::abort()` on failure, but the user
could change it by defining some macros, like
BOOST_FIXED_STRING_THROW_EXCEPTION_ON_PRECONDITION_VIOLATION.

Thus, if the user doesn't have any better idea, she just defines the macro,
and fixed_string will be throwing exceptions on precondition violations.
Everything is "safe", at least under some definition of safety. If on my
platform I cannot use exceptions, I can configure the precondition macro to
call std::abort().

The important point is that I control the behavior only of the
preconditions and only in this library. It does not affect other libraries.
It also does not affect functions that are always guaranteed to throw, such
as fixed_string::at().

Now, if I am doing a test build (or if I can afford doing this in
production), I can compile my program with UB-sanitizer, where the compiler
treats most of the language level UB as a checkable condition, checks it,
and upon a failed check logs a message and aborts. Because compilers are
now required to check most of the UB  when constexpr functions when
evaluated at compile-time, offers the same checks for runtime evaluation
comes practically for free; and compilers like GCC and clang already
provide them. At least one of them also inserts a runtime check and abort
on every occurrence of `__builtin_unreachable()`. So if I define macro
BOOST_FIXED_STRING_PRECONDITION to do `if(!condition)
__builtin_unreachable()` I will get a uniform error message in my test
build if the precondition is ever violated.

Additionally, if I have a static analyzer like Coverity (and I think
Microsoft also offers one), I can make the macro expand to an annotation
that the static analyzer understands and looks for, and then paths that
lead to precondition violation can be exposed statically without even
running the program. This would not be possible if the library only
guaranteed to throw exceptions. You can put any instrumentation code you
want under this macro. But it only works because if the library doesn't
guarantee any particular behavior on precondition violation: it only says
"I leave myself the liberty to do what I choose."

And of course, people in applications where the trade-off between security
and and performance is different (like video games, in internal parts
remote from doing networking) may choose -- after extensive testing has
been done -- to expand the macro to `__builtin_assume() `to check if this
gives them additional performance gain. But this is never done by default,
has to be explicitly opted in, and is really a side bonus to the main
feature which is about safety and controlling how bugs are detected.

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list

> ```
> void fixed_string<N>::resize(size_type s)
> {
>    BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity());
>    // then do the job
> }
> ```

+1 on that. I'm always advocating for safe-by-default and found it a
huge mistake to make operator[] the unchecked one instead of at()

So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the
right choice IMO.




_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
czw., 5 gru 2019 o 09:15 Alexander Grund via Boost <[hidden email]>
napisał(a):

>
> > ```
> > void fixed_string<N>::resize(size_type s)
> > {
> >    BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity());
> >    // then do the job
> > }
> > ```
>
> +1 on that. I'm always advocating for safe-by-default and found it a
> huge mistake to make operator[] the unchecked one instead of at()
>
> So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the
> right choice IMO.
>

However, in order for this to be implemented in `fixed_string`, the library
authors would have to make (and document) a design decision that contract
for using this library is that users write the code so that resizing never
exceeds capacity, and if such condition is nonetheless ever detected it is
treated as programmer bug.

But my impression is that the library has taken a different route: it is
absolutely fine to resize over capacity, and in this case the program will
simply jump to a different place in the execution flow (throwing an
exception does this), and user deliberately triggers this event because she
wants to get to this specific place in program execution.

But the library never documented which design was chosen, so I do not even
know if it makes sense to suggest BOOST_FIXED_STRING_PRECONDITION().
Regards,
&rzej;



>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-12-05 11:15, Alexander Grund via Boost wrote:

>
>> ```
>> void fixed_string<N>::resize(size_type s)
>> {
>>    BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity());
>>    // then do the job
>> }
>> ```
>
> +1 on that. I'm always advocating for safe-by-default and found it a
> huge mistake to make operator[] the unchecked one instead of at()
>
> So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the
> right choice IMO.

I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks
in release mode, please.

What's the point of this check when your index is guaranteed to not
exceed size()-1?

In my whole programming practice, not once did I need at(). Not only
because I didn't need the check at this point, but also because even if
I did need a check at some point before operator[] call, I also was not
satisfied with the exception at() would throw.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
czw., 5 gru 2019 o 09:59 Andrey Semashev via Boost <[hidden email]>
napisał(a):

> On 2019-12-05 11:15, Alexander Grund via Boost wrote:
> >
> >> ```
> >> void fixed_string<N>::resize(size_type s)
> >> {
> >>    BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity());
> >>    // then do the job
> >> }
> >> ```
> >
> > +1 on that. I'm always advocating for safe-by-default and found it a
> > huge mistake to make operator[] the unchecked one instead of at()
> >
> > So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the
> > right choice IMO.
>
> I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks
> in release mode, please.
>
> What's the point of this check when your index is guaranteed to not
> exceed size()-1?
>
> In my whole programming practice, not once did I need at(). Not only
> because I didn't need the check at this point, but also because even if
> I did need a check at some point before operator[] call, I also was not
> satisfied with the exception at() would throw.
>

Are you opposing against the idea of user-controlled
BOOST_FIXED_STRING_PRECONDITION() in general, or to throwing by default of
to performing runtime-checks in release builds regardless of what action is
taken later?

BOOST_ASSERT() does perform checks in release builds unless you go and
define NDEBUG, which does not correspond 1-to-1 to release builds.

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
>> I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks
>> in release mode, please.
That is exactly the point: You defined BOOST_FIXED_STRING_PRECONDITION
to BOOST_ASSERT and your use-case is covered.
>> What's the point of this check when your index is guaranteed to not
>> exceed size()-1?
>>
>> In my whole programming practice, not once did I need at(). Not only
>> because I didn't need the check at this point, but also because even if
>> I did need a check at some point before operator[] call, I also was not
>> satisfied with the exception at() would throw.
Don't want to start that argument again. It's always the same. Reality
is: Especially beginners fail to check the size and even advanced
programmers accidently use sizes/index out of bounds leading to many of
the existing CVEs due to buffer overrun.
Again: Idea is: Safe by default. Throwing an exception satisfies this.
std::abort would too but cannot be recovered from. If you don't want the
default, you can change it.
> BOOST_ASSERT() does perform checks in release builds unless you go and
> define NDEBUG, which does not correspond 1-to-1 to release builds.

Build systems (e.g. CMake) usually defined NDEBUG for release builds. So
one cannot rely on BOOST_ASSERT to be there for release builds.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-12-05 12:05, Andrzej Krzemienski wrote:

>
>
> czw., 5 gru 2019 o 09:59 Andrey Semashev via Boost
> <[hidden email] <mailto:[hidden email]>> napisał(a):
>
>     On 2019-12-05 11:15, Alexander Grund via Boost wrote:
>      >
>      >> ```
>      >> void fixed_string<N>::resize(size_type s)
>      >> {
>      >>    BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity());
>      >>    // then do the job
>      >> }
>      >> ```
>      >
>      > +1 on that. I'm always advocating for safe-by-default and found it a
>      > huge mistake to make operator[] the unchecked one instead of at()
>      >
>      > So using BOOST_FIXED_STRING_PRECONDITION which throws by default
>     is the
>      > right choice IMO.
>
>     I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks
>     in release mode, please.
>
>     What's the point of this check when your index is guaranteed to not
>     exceed size()-1?
>
>     In my whole programming practice, not once did I need at(). Not only
>     because I didn't need the check at this point, but also because even if
>     I did need a check at some point before operator[] call, I also was not
>     satisfied with the exception at() would throw.
>
>
> Are you opposing against the idea of user-controlled
> BOOST_FIXED_STRING_PRECONDITION() in general, or to throwing by default
> of to performing runtime-checks in release builds regardless of what
> action is taken later?

I'm strongly opposed to runtime checks regardless of the error outcome
in release mode (ok, when NDEBUG is defined) because most of the time
you already know the index is within bounds. When you don't know, you
most likely should have check that earlier (e.g. before the parsing loop
or before your higher level operation starts).

I'm opposed to throwing an exception in case of failure because the
exception and its error description is likely meaningless to high level
code and nearly useless when debugging. The higher level code may not
want an exception to begin with, in which case it would perform the
check itself, and the check in operator[] becomes useless.

I'm mildly opposed to using custom macros (other than NDEBUG) to modify
behavior at compile time, because it opens up the possibility of
configuration inconsistencies and ODR issues. I'm fine with it as long
as by default (when nothing custom is defined) the behavior is
equivalent to BOOST_ASSERT.

> BOOST_ASSERT() does perform checks in release builds unless you go and
> define NDEBUG, which does not correspond 1-to-1 to release builds.

NDEBUG gets defined by default in release mode by many IDEs and build
systems, it is the standard macro to disable asserts, including the
standard assert(). In practice, NDEBUG and release mode nearly always go
hand in hand and NDEBUG is perceived as an indication of a release build.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list

> I'm strongly opposed to runtime checks regardless of the error outcome
> in release mode (ok, when NDEBUG is defined) because most of the time
> you already know the index is within bounds. When you don't know, you
> most likely should have check that earlier (e.g. before the parsing
> loop or before your higher level operation starts).

That was always my counter argument: If you can prove it to a reviewer
looking at your code that the index is in-bounds, then the compiler can
do the same and remove the check and exception.

I never bought "most of the time you already know the index is within
bounds" because it is equivalent to "most of the time your code is
correct, let's not use checks/unit tests/hardening/..."
If what you say is true, then buffer overflows would be non-existant.
Are they?

> I'm opposed to throwing an exception in case of failure because the
> exception and its error description is likely meaningless to high
> level code and nearly useless when debugging. The higher level code
> may not want an exception to begin with, in which case it would
> perform the check itself, and the check in operator[] becomes useless.
No, the check gets optimized away if you do everything right. If you
don't (do it right and have no check/exception/abort) you'll get UB,
buffer overflows and security risks. Especially for a stack entity this
will pretty much always be a SERIOUS security issue.
So to protect users one would need at least std::abort, but using an
exception allows to recover from it.
> I'm mildly opposed to using custom macros (other than NDEBUG) to
> modify behavior at compile time, because it opens up the possibility
> of configuration inconsistencies and ODR issues. I'm fine with it as
> long as by default (when nothing custom is defined) the behavior is
> equivalent to BOOST_ASSERT.
ODR is a real issue, true that because you may end up using a compiled
library having it defined differently.



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-12-05 12:15, Alexander Grund via Boost wrote:
>>> I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks
>>> in release mode, please.
> That is exactly the point: You defined BOOST_FIXED_STRING_PRECONDITION
> to BOOST_ASSERT and your use-case is covered.

My point is that I shouldn't have to define anything. Unchecked
operator[] in release mode is expected by default.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-12-05 12:45, Alexander Grund via Boost wrote:

>
>> I'm strongly opposed to runtime checks regardless of the error outcome
>> in release mode (ok, when NDEBUG is defined) because most of the time
>> you already know the index is within bounds. When you don't know, you
>> most likely should have check that earlier (e.g. before the parsing
>> loop or before your higher level operation starts).
>
> That was always my counter argument: If you can prove it to a reviewer
> looking at your code that the index is in-bounds, then the compiler can
> do the same and remove the check and exception.

No, you're giving too much credit to compilers. Compilers cannot analyze
at the same level as humans do. For example, if the valid index is
stored in a data member in one method of a class and used in another,
and the first method is required to be called first (which might also be
enforced by a runtime check via another member variable), the compiler
is not able to know that the index is always valid. At least, I haven't
seen such cleverness.

> I never bought "most of the time you already know the index is within
> bounds" because it is equivalent to "most of the time your code is
> correct, let's not use checks/unit tests/hardening/..."
> If what you say is true, then buffer overflows would be non-existant.
> Are they?

Mistakes happen, I'm not arguing with that. But defensive programming is
not the solution because it doesn't save you from these mistakes. Code
that passes invalid index to at() is just as incorrect as the one that
does so with operator[].

>> I'm opposed to throwing an exception in case of failure because the
>> exception and its error description is likely meaningless to high
>> level code and nearly useless when debugging. The higher level code
>> may not want an exception to begin with, in which case it would
>> perform the check itself, and the check in operator[] becomes useless.
>
> No, the check gets optimized away if you do everything right.

As I said above, you can't count on that.

> If you
> don't (do it right and have no check/exception/abort) you'll get UB,
> buffer overflows and security risks. Especially for a stack entity this
> will pretty much always be a SERIOUS security issue.
> So to protect users one would need at least std::abort, but using an
> exception allows to recover from it.

You can't recover from an unexpected exception. And it is unexpected by
account that we're talking about a human mistake.

The correct way of tackling mistakes is human reviewing, testing and
debugging. For the debugging part, a crash with a saved backtrace is
more useful than an exception with a meaningless error message like
"fixed_string::at", as many standard libraries like to do.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
> No, you're giving too much credit to compilers. Compilers cannot
> analyze at the same level as humans do. For example, if the valid
> index is stored in a data member in one method of a class and used in
> another, and the first method is required to be called first (which
> might also be enforced by a runtime check via another member
> variable), the compiler is not able to know that the index is always
> valid. At least, I haven't seen such cleverness.

That's why there is usually a checked and an unchecked method. Your
usual accesses though are: `for(auto c: string) ...`, for(int i=0;
i<string.size();i++) do(string[i]);`, `if(i+1<string.size())
bar(string[i], string[i+1])` and the like. Easy on optimizers

The scenario you describe is hard to check for a human to: Are you sure
this invariant always holds if it is enforced in some other method which
need to have been called before? For all iterations of code evolving? If
so then better use the string.unsafe_get_i_have_checked_the_index(i)
method (made-up). This is a clear signal to any reviewer/reader that the
index was indeed checked and the unsafe call is deliberately used.

>> I never bought "most of the time you already know the index is within
>> bounds" because it is equivalent to "most of the time your code is
>> correct, let's not use checks/unit tests/hardening/..."
>> If what you say is true, then buffer overflows would be non-existant.
>> Are they?
>
> Mistakes happen, I'm not arguing with that. But defensive programming
> is not the solution because it doesn't save you from these mistakes.
> Code that passes invalid index to at() is just as incorrect as the one
> that does so with operator[].
No. Defensive programming is a safety net. If you always use at() and
only in specific scenarios [] then at least your application will crash
with a possibly good stacktrace or you can catch the exception in
main(), prepare a meaningful error, log it, send it to your server and
play jingle bells.
>> No, the check gets optimized away if you do everything right.
>
> As I said above, you can't count on that.
So what? Marginally lower performance if even measurable but safe from
buffer overflows.
> You can't recover from an unexpected exception. And it is unexpected
> by account that we're talking about a human mistake.
You can: For e.g. a connection/parser/... you terminate the connection
and log the issue. At the least you can catch it in main or some
exception hook to log a backtrace and exit gracefully.
> The correct way of tackling mistakes is human reviewing, testing and
> debugging. For the debugging part, a crash with a saved backtrace is
> more useful than an exception with a meaningless error message like
> "fixed_string::at", as many standard libraries like to do.

You don't get a crash, at least not with a "meaningful backtrace" if you
don't use exceptions. You get UB, a security issue and possibly a crash
in some other part later instead of where the error was detected. And
why would "human reviewing, testing and debugging" be more reliable than
"human codewriting"? It is still a human and humans make mistakes.
Better use a safety net.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
czw., 5 gru 2019 o 12:23 Alexander Grund via Boost <[hidden email]>
napisał(a):

> > No, you're giving too much credit to compilers. Compilers cannot
> > analyze at the same level as humans do. For example, if the valid
> > index is stored in a data member in one method of a class and used in
> > another, and the first method is required to be called first (which
> > might also be enforced by a runtime check via another member
> > variable), the compiler is not able to know that the index is always
> > valid. At least, I haven't seen such cleverness.
>
> That's why there is usually a checked and an unchecked method. Your
> usual accesses though are: `for(auto c: string) ...`, for(int i=0;
> i<string.size();i++) do(string[i]);`, `if(i+1<string.size())
> bar(string[i], string[i+1])` and the like. Easy on optimizers
>
> The scenario you describe is hard to check for a human to: Are you sure
> this invariant always holds if it is enforced in some other method which
> need to have been called before? For all iterations of code evolving? If
> so then better use the string.unsafe_get_i_have_checked_the_index(i)
> method (made-up). This is a clear signal to any reviewer/reader that the
> index was indeed checked and the unsafe call is deliberately used.
>
> >> I never bought "most of the time you already know the index is within
> >> bounds" because it is equivalent to "most of the time your code is
> >> correct, let's not use checks/unit tests/hardening/..."
> >> If what you say is true, then buffer overflows would be non-existant.
> >> Are they?
> >
> > Mistakes happen, I'm not arguing with that. But defensive programming
> > is not the solution because it doesn't save you from these mistakes.
> > Code that passes invalid index to at() is just as incorrect as the one
> > that does so with operator[].
> No. Defensive programming is a safety net. If you always use at() and
> only in specific scenarios [] then at least your application will crash
> with a possibly good stacktrace or you can catch the exception in
> main(), prepare a meaningful error, log it, send it to your server and
> play jingle bells.
> >> No, the check gets optimized away if you do everything right.
> >
> > As I said above, you can't count on that.
> So what? Marginally lower performance if even measurable but safe from
> buffer overflows.
> > You can't recover from an unexpected exception. And it is unexpected
> > by account that we're talking about a human mistake.
> You can: For e.g. a connection/parser/... you terminate the connection
> and log the issue. At the least you can catch it in main or some
> exception hook to log a backtrace and exit gracefully.
> > The correct way of tackling mistakes is human reviewing, testing and
> > debugging. For the debugging part, a crash with a saved backtrace is
> > more useful than an exception with a meaningless error message like
> > "fixed_string::at", as many standard libraries like to do.
>
> You don't get a crash, at least not with a "meaningful backtrace" if you
> don't use exceptions. You get UB, a security issue and possibly a crash
> in some other part later instead of where the error was detected. And
> why would "human reviewing, testing and debugging" be more reliable than
> "human codewriting"? It is still a human and humans make mistakes.
> Better use a safety net.
>

I think that this discussion diverges from the point of this thread. It was
not about operator[] (for which it is clear that passing a too big index is
a bug) but about resizing over fixed_string's capacity, in which case the
library doesn't make it clear if it is even a bug, or something that the
programmer may want to do.

For instance, in std::sting it is not a bug to resize over the capacity,
and fixed_string is supposed to be a drop-in replacement.

I would like to make this question sorted out first; and only then can we
decide what to put inside the function.

Regards,
&rzej;

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-12-05 14:22, Alexander Grund via Boost wrote:

>> No, you're giving too much credit to compilers. Compilers cannot
>> analyze at the same level as humans do. For example, if the valid
>> index is stored in a data member in one method of a class and used in
>> another, and the first method is required to be called first (which
>> might also be enforced by a runtime check via another member
>> variable), the compiler is not able to know that the index is always
>> valid. At least, I haven't seen such cleverness.
>
> That's why there is usually a checked and an unchecked method. Your
> usual accesses though are: `for(auto c: string) ...`, for(int i=0;
> i<string.size();i++) do(string[i]);`, `if(i+1<string.size())
> bar(string[i], string[i+1])` and the like. Easy on optimizers
>
> The scenario you describe is hard to check for a human to: Are you sure
> this invariant always holds if it is enforced in some other method which
> need to have been called before? For all iterations of code evolving?

Yes, and such use cases appear more often than you might think. Call
that first method a constructor or init(), and it becomes obvious to the
human reviewer that that method must be called before further using the
object. You may argue what if that init() method is not called and the
answer depends on how probable that is. Maybe that method is only called
from a few places of your program, like factory methods, and you can
easily ensure the precondition is not violated. If it is a real
possibility that you really want to protect from, add a flag or use some
other criteria for checking that it was called. More often, though, I
would just make it a documented precondition, and allow UB if users
don't follow it.

> If
> so then better use the string.unsafe_get_i_have_checked_the_index(i)
> method (made-up). This is a clear signal to any reviewer/reader that the
> index was indeed checked and the unsafe call is deliberately used.

Such method is called operator[]. I understand you want that name for
the checked version, but that is not the case for historical reasons
(for which I'm glad).

And again, where check is needed, I will have it written myself and the
way I want it (which may not involve throwing an exception). Where I
don't have it written, I don't want it, at all, and I'm responsible for
the consequences. I would prefer if such use pattern didn't require me
jumping through hoops.

>> Mistakes happen, I'm not arguing with that. But defensive programming
>> is not the solution because it doesn't save you from these mistakes.
>> Code that passes invalid index to at() is just as incorrect as the one
>> that does so with operator[].
> No. Defensive programming is a safety net.

It is not. It doesn't make your code correct, it doesn't make it work,
it doesn't help you fixing the issue. It only prevents your code from
crashing, which is a bad thing (because a crash - with a backtrace - is
better).

> If you always use at() and
> only in specific scenarios [] then at least your application will crash
> with a possibly good stacktrace or you can catch the exception in
> main(), prepare a meaningful error, log it, send it to your server and
> play jingle bells.

Thing is, you can't prepare a meaningful error in main(). You don't have
a backtrace at that point and you don't know which of the thousands of
at() calls threw. Sure, your application won't crash via signal, but, as
I said, that is a bad thing.

>>> No, the check gets optimized away if you do everything right.
>>
>> As I said above, you can't count on that.
> So what? Marginally lower performance if even measurable but safe from
> buffer overflows.

I don't want that marginal performance drop, especially given that I get
nothing in return.

>> You can't recover from an unexpected exception. And it is unexpected
>> by account that we're talking about a human mistake.
> You can: For e.g. a connection/parser/... you terminate the connection
> and log the issue. At the least you can catch it in main or some
> exception hook to log a backtrace and exit gracefully.

You could terminate the connection and things like that, but ultimately
that exception doesn't help you fixing the problem. When connections are
dropping like flies on your production server and you have no idea where
to fix the problem, I'd prefer to get just one backtrace of a crash
instead and then fix the bug in a matter of minutes.

>> The correct way of tackling mistakes is human reviewing, testing and
>> debugging. For the debugging part, a crash with a saved backtrace is
>> more useful than an exception with a meaningless error message like
>> "fixed_string::at", as many standard libraries like to do.
>
> You don't get a crash, at least not with a "meaningful backtrace" if you
> don't use exceptions. You get UB, a security issue and possibly a crash
> in some other part later instead of where the error was detected. And
> why would "human reviewing, testing and debugging" be more reliable than
> "human codewriting"? It is still a human and humans make mistakes.
> Better use a safety net.

With asserts enabled you'll get a crash at the point of error.
Hopefully, you'll discover the bug soon enough during testing. But even
in release builds, when asserts are disabled, chances are high the crash
will still point you to the problematic code. Of course, memory access
bugs are nasty and painful to deal with, but I honestly didn't have to
deal with one for quite a long time now, even though I've never used
at(). You're more likely to mess up memory when working with raw pointers.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [review][fixed_string] Andrzej's feedback

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2019-12-05 15:26, Andrzej Krzemienski via Boost wrote:

>
> I think that this discussion diverges from the point of this thread. It was
> not about operator[] (for which it is clear that passing a too big index is
> a bug) but about resizing over fixed_string's capacity, in which case the
> library doesn't make it clear if it is even a bug, or something that the
> programmer may want to do.
>
> For instance, in std::sting it is not a bug to resize over the capacity,
> and fixed_string is supposed to be a drop-in replacement.
>
> I would like to make this question sorted out first; and only then can we
> decide what to put inside the function.

It is not so much about exceeding the capacity() but about exceeding
max_size(). In fixed_string, capacity() happens to be equal to
max_size() upon construction.

Exceeding max_size() in std::string and containers is diagnosed with an
exception, and IMO fixed_string should follow.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12