A solution to return dynamic strings without heap allocation. Any interest?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
Hi all,

I haven't been succesful at attracting interest in a formatting
library [1] I've been working lately. But recently I realized that
part of it could be isolated as a small standalone library that
could solve an old common troublesome situation in C++:

Suppose you need to create a function that returns/provides a string
whose content and size is unknown at compilation time. The first
approach is to make it return a `std::string`. But if it need to be
usable in environments like bare-metal real-time system,
then one usually makes it take a raw string as an output argument,
more or less like this:

  struct result{ char* it; bool truncated; };
  result get_message(char* dest, std::size_t dest_len);

But this is clearly not a perfect solution since there's nothing
really effective the caller can do when get_mesage fails because
of the destination string being is too small.

So I present the `outbut` abstract class. It somehow resembles
`std::streambuf`, but with a simpler and lower level design,
which is the result of many attempts looking for the best
performance [2] and usability in my formatting library.
Afaics, it does not require a hosted C++ implementation,
though I would like someone else to confirm that.

Now the caller of `get_message` has to choose or create a
suitable class type deriving from outbuf, that dictates where
the message is written to. For example, if the user wants to
get a `std::string`, then `string_maker` will do the job:

  #include <boost/outbuf/string.hpp>

  // ...
      boost::outbuf::string_maker<false> msg;
      get_message(msg)
      std::string str = msg.finish();

Or, if one wants it to write into a raw string,
then use `cstr_writer`

    char buff[buff_size];
    boost::outbuf::cstr_writer csw(buff, buff_size);
    get_message(csw);
    auto result = csw.finish();
    if (result.truncated) {
      // ...

Those `finish` functions above do not belong to `outbuf`.
They are defined in the concrete derived types only.
It's solely by convention that they share the same name.

Yes, using `string_maker` still leads to heap allocation
and `cstr_writer` to string truncation. The problem isn't
solved by these. However, a string object is never the final
destination. So the user could rather use another class
that writes the message directly into the final destination
( output console, a log file, an LCD display or whatever ).
It is not difficult to implement concrete subtypes of `outbuf`.

`outbuf` is actually a type alias:

    template <bool NoExcept, typename CharT>
    class basic_outbuf;

    using outbuf = basic_outbuf<false, char>;
    using outbuf_noexcept = basic_outbuf<true, char>;

That `NoExcept` template parameter is perhaps the controversial part.
It is not present originally in my formatting library.

Besides the destructor, `basic_outbuf` has only one virtual
function: `recycle()`, and it is declared as `noexcept(NoExcept)`.
This is the only effect the `NoExcept` template parameter has.
All other functions are guaranteed not to throw.

Hence, by taking a `outbuf_noexcept&` parameter, a function
states that the destination must not throw. That might be
particularly good if such object comes from another module
and we must avoid exceptions crossing modules boundaries.

On the other hand, if a function takes an `outbuf&`, then it also
accepts `outbuf_noexcept&`, because `basic_outbuf<true, CharT>&`
derives from `basic_outbuf<false, CharT>`.

When using `string_maker` you can choose between the two kinds.
`string_maker<true>` derives from `basic_outbuf<true, char>&`.
So if any exception raises from its internal `std::string`,
then it is caught by a try/catch(...) block, stored as
an `exception_ptr` and rethrown by `finish()`. This has the
undesirable effect of delaying its proper handling ( after all,
we rather stop what's being doing as soon as possible when
an error appears ). So I think the recommendation would be to use
`string_maker<false>&` if possible, and `string_maker<true>&`
only if necessary.

The reason why I think it's controversial is that it makes
`recycle()` violates the Lakos Rule. And although the Lakos Rule
is not a requirement in boost (afaik), I was wondering whether
this library could interest LEWG in future as well.
Anyway, this whole noexcept idea is not a central part of this
library and can it be removed.

Now, the other topic is how to implement that `get_message` function,
i.e., how to write into an `outbuf` object.
One can use `puts` and `putc` functions to insert string and chars.
One can also use `fmtlib` through an output iterator adapter.
Or one can write directly in the buffer.
But I will ask you to read the doc [3] for that. It's a quick read,
quicker than this email.

The repository is:

  https://github.com/robhz786/outbuf

I would prefer it to be part of Boost.Core instead of being
a standalone library, and also to remove the `outbuf`
namespace. But that is up to you.

Best regards,
Roberto

[1] The Stringify library:
    https://github.com/robhz786/stringify

[2] The great performance of Stringify is mainly thanks to
    the design of outbuf ( named there as `output_buffer` ):

https://robhz786.github.io/stringify/doc/html/benchmarks/benchmarks.html#benchmarks.benchmarks.run_time_performance

[3] "Writing into and outbuf object"

https://robhz786.github.io/outbuf/doc/html/index.html#boost_outbuf.overview.writting_into_an_outbuf_object

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On 8/29/19 3:24 PM, Roberto Hinz via Boost wrote:

>      auto result = csw.finish();
>      if (result.truncated) {
>        // ...

I suggest that you change the return type of finish() to contain
the number of element written instead of the truncated boolean.
This would make outbuf extensible to binary data as well where you
cannot rely on a terminating zero to tell you how much data has
been written.

The return type could also have begin()/end() member functions, which
makes it directly usable with STL algorithms. In that case it would
also make sense to have data()/size().

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Sun, Sep 1, 2019 at 7:45 AM Bjorn Reese via Boost <[hidden email]>
wrote:

> On 8/29/19 3:24 PM, Roberto Hinz via Boost wrote:
>
> >      auto result = csw.finish();
> >      if (result.truncated) {
> >        // ...
>
> I suggest that you change the return type of finish() to contain
> the number of element written instead of the truncated boolean.
> This would make outbuf extensible to binary data as well where you
> cannot rely on a terminating zero to tell you how much data has
> been written.
>
> The return type could also have begin()/end() member functions, which
> makes it directly usable with STL algorithms. In that case it would
> also make sense to have data()/size().
>

outbuf can be used to binary data, but the basic_cstr_writer class
in particular may not be suitable for that, since its finish() function
aways writes a terminating zero, requiring an extra space in the
destination string. Perhaps we could add another class template,
`basic_bin_writer`, that would never write a terminating character.

Anyway, the returned result contains a `ptr` member that points to
the end of the string. In order to add begin()/data()/size() functions
basic_cstr_writer would also need to store the initial position,
which would increase its size a little bit. And that's the only reason
why I did not add it, since I think it would not be used most of
time, and the caller already knows the begin anyway.
It's convenience vs tiny cost decision. I will go for the what the
majority prefers.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Aug 29, 2019 at 9:24 AM Roberto Hinz via Boost
<[hidden email]> wrote:
> I would prefer it to be part of Boost.Core instead of being
> a standalone library, and also to remove the `outbuf`
> namespace. But that is up to you.

Boost.Core is for Boost facilities used by other Boost libraries that
for simpler tasks.

You could propose it for Boost.Utility. However, it seems more worthy
of its own library.

In either choice (Utility, or your own library) the process for a
Boost formal review is at:
https://www.boost.org/community/reviews.html

Glen

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 9/1/19 3:10 PM, Roberto Hinz via Boost wrote:

> outbuf can be used to binary data, but the basic_cstr_writer class
> in particular may not be suitable for that, since its finish() function
> aways writes a terminating zero, requiring an extra space in the
> destination string. Perhaps we could add another class template,
> `basic_bin_writer`, that would never write a terminating character.

I was thinking more broadly than basic_cstr_write which I happened to
quote. I may want my template classes to operate on the return type
of any writer in a consistent manner. This is possible if they adhere
to ContiguousRange and SizedRange as suggested.

> Anyway, the returned result contains a `ptr` member that points to
> the end of the string. In order to add begin()/data()/size() functions
> basic_cstr_writer would also need to store the initial position,
> which would increase its size a little bit. And that's the only reason

Assuming the buffer is contiguous, begin == end - size, so there would
be no need to store an extra pointer.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Sun, Sep 1, 2019 at 11:59 AM Bjorn Reese via Boost <[hidden email]>
wrote:

> On 9/1/19 3:10 PM, Roberto Hinz via Boost wrote:
>
> > outbuf can be used to binary data, but the basic_cstr_writer class
> > in particular may not be suitable for that, since its finish() function
> > aways writes a terminating zero, requiring an extra space in the
> > destination string. Perhaps we could add another class template,
> > `basic_bin_writer`, that would never write a terminating character.
>
> I was thinking more broadly than basic_cstr_write which I happened to
> quote. I may want my template classes to operate on the return type
> of any writer in a consistent manner. This is possible if they adhere
> to ContiguousRange and SizedRange as suggested.
>

It might be quite a challenge to keep such consistency among writers.
Some of them write to file. There is one of them that doesn't actually
write anything, but just ignores all content ( the `discarded_outbuf`
which is documented I but forgot to implement ). What should finish()
return in those cases? And, of course, there are the user-defined
writers. We want the library to work in all sort of destination types.



> > Anyway, the returned result contains a `ptr` member that points to
> > the end of the string. In order to add begin()/data()/size() functions
> > basic_cstr_writer would also need to store the initial position,
> > which would increase its size a little bit. And that's the only reason
>
> Assuming the buffer is contiguous, begin == end - size, so there would
> be no need to store an extra pointer.
>

But then we need to store the size. See basic_cstr_writer implementation:
https://github.com/robhz786/outbuf/blob/59a6c8c3159eee94c0fcefed3dd52591dba26ee6/include/boost/outbuf.hpp#L425-L481

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
No one interested?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
Didn’t you see reply from Glen Fernandez? Check archive.

--
Janek Kozicki, PhD. DSc. Arch. Assoc. Prof.
Gdańsk University of Technology
Faculty of Applied Physics and Mathematics
Department of Theoretical Physics and Quantum Information
--
pg.edu.pl/jkozicki (click English flag on top right)
On 3 Sep 2019, 14:21 +0200, Roberto Hinz via Boost <[hidden email]>, wrote:
> No one interested?
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Tue, Sep 3, 2019 at 11:11 AM Janek Kozicki <[hidden email]>
wrote:

> Didn’t you see reply from Glen Fernandez? Check archive.
>

Well, yes, but I din't see his message as showing interest,
but just as a guidance.
And I presume I first need to check whether there is any interest,
otherwise what's the point of going any further.
Am I misunderstanding something?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 3, 2019 at 5:21 AM Roberto Hinz via Boost
<[hidden email]> wrote:
> No one interested?

`std::basic_ostream` is actually quite usable once you figure out how
it works (which is admittedly more difficult than it should be). It
can be set up to not perform any memory allocations, depending on the
implementation of the derived class. It might not be perfect but it is
part of the standard library and thus has a natural advantage that
would require extraordinary functionality from an external component
to overcome. And I'm not seeing that in the proposed `outbuf`.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Tue, Sep 10, 2019 at 11:05 PM Vinnie Falco <[hidden email]>
wrote:

> `std::basic_ostream` is actually quite usable once you figure out how
> it works
>

Let me illustrate why I disagree with that. Suppose you want to
implement a base64 encoder. You want it to be fast, agnostic,
and simple to use. Now suppose you adopt `std::ostream` as
the destination type:

  void to_base64( std::ostream& dest, const std::byte* src, std::size_t
count );

You will face two issues:

1) It doesn't matter how well you (as the library author) understand
   basic_ostream. The *user* needs to implement derivates of
   basic_ostream to customize the destination types.

2) It's impossible to achieve a decent performance. If you used
   `outbuf` you could write directly into  the buffer. But with
   `std::ostream` you have to call member functions like `put`
   or `write` to for each little piece of the content, or to use an
   additional intermediate buffer.

And this is far to be a specific use case. The same issues apply
for any kind of encoding, binary or text.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Thu, Sep 12, 2019 at 10:21 AM Roberto Hinz <[hidden email]> wrote:
> Let me illustrate why I disagree with that...

Okay, these are two fair points, but then the title of the original
post is not accurate. What you're really proposing is "a better
std::ostream" which is an entirely different conversation.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Fri, Sep 13, 2019 at 9:46 AM Vinnie Falco <[hidden email]> wrote:

> On Thu, Sep 12, 2019 at 10:21 AM Roberto Hinz <[hidden email]> wrote:
> > Let me illustrate why I disagree with that...
>
> Okay, these are two fair points, but then the title of the original
> post is not accurate. What you're really proposing is "a better
> std::ostream" which is an entirely different conversation.
>
> Regards
>

You are right. Thanks for the comments

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Fri, Sep 13, 2019 at 6:50 AM Roberto Hinz <[hidden email]> wrote:
> You are right. Thanks for the comments

Please don't take any of these comments as discouragement. Quite the
opposite they are well intended with a goal of progress in mind. There
have been a chorus of voices clamoring for "a better std::ostream" and
it has been the subject of a few papers. Offering users better
versions or replacements for standard library types such as
std::ostream lands squarely within the purview of the Boost Libraries.
There is already precedent for this, such as
boost::system::error_category and boost::shared_ptr, both of which are
superior to their standard library equivalents.

However any proposed replacement needs to address the body of work
that has already been done in this area. What makes it better or more
usable?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Fri, Sep 13, 2019 at 11:42 AM Vinnie Falco <[hidden email]>
wrote:

> On Fri, Sep 13, 2019 at 6:50 AM Roberto Hinz <[hidden email]> wrote:
> > You are right. Thanks for the comments
>
> Please don't take any of these comments as discouragement. Quite the
> opposite they are well intended with a goal of progress in mind. There
> have been a chorus of voices clamoring for "a better std::ostream" and
> it has been the subject of a few papers. Offering users better
> versions or replacements for standard library types such as
> std::ostream lands squarely within the purview of the Boost Libraries.
> There is already precedent for this, such as
> boost::system::error_category and boost::shared_ptr, both of which are
> superior to their standard library equivalents.
>
> However any proposed replacement needs to address the body of work
> that has already been done in this area. What makes it better or more
> usable?
>
> Thanks
>

Not discoureged at all. I will enhance the documentation based on
your feedbacks and come back later.

Thank you

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

An alternative to std::ostream

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Hi all, this is a continuation of the thread
"A solution to return dynamic strings without heap allocation. Any
interest?"
Just telling you that I rewrote the docs, especially the rationale:
    https://robhz786.github.io/outbuf/doc/outbuf.html
Best regards
Robhz

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: An alternative to std::ostream

Boost - Dev mailing list
Hi Roberto,

> On 3. Oct 2019, at 14:22, Roberto Hinz via Boost <[hidden email]> wrote:
>
> Hi all, this is a continuation of the thread
> "A solution to return dynamic strings without heap allocation. Any
> interest?"
> Just telling you that I rewrote the docs, especially the rationale:
>    https://robhz786.github.io/outbuf/doc/outbuf.html
> Best regards
> Robhz

Quoted from the rationale:

"Your function is complex to use. The user needs to implement a class that derives from ostream to customize the destination. It’s a complex task for most C++ programmers."

Agreed, although boost.iostreams makes that easier.

"It’s impossible to achieve a good perfomance. std::ostream does not provide direct access to the buffer. to_base64 needs to call member functions like write or put for every little piece of the content, or to use an itermediate buffer."

It is not impossible to achieve good performance, page 68 of http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf list problems, which are solvable.

In practice, increasing the buffer size helps and turning off synchronisation with stdio:
https://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-better
The SO answer lists several examples were C++'s iostreams beats C's stdio in performance.

Your argument is also not convincing. Just calling member functions doesn't make something slow if you compile with optimisations, which is a must with C++. I think it is quite natural that the stream makes it hard for you to touch the buffer. The stream objects hide buffer management under an interface. The ostream object handles the buffer for you, you don't have to know when you hit the boundary and things need to be flushed to the device. You can't hide something and expose it at the same time, this is breaking the encapsulation, so naturally, the streams make it difficult to touch the buffer directly. Although you can, if you really want to, and it is pretty simple to set up:

char Buffer[N];
std::ofstream file("file.txt");
file.rdbuf()->pubsetbuf(Buffer, N);

Now you can mess around with the stack-allocated buffer. It is not clear to me what the advantage of outbuf is over this.

I think the real problem with iostreams is that it lacks good documentation and tutorials on how to do the more complicated things.

Best regards,
Hans

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: An alternative to std::ostream

Boost - Dev mailing list
Hi Hans,

On Mon, Oct 7, 2019 at 5:14 AM Hans Dembinski <[hidden email]>
wrote:

> "It’s impossible to achieve a good perfomance. std::ostream does not
> provide direct access to the buffer. to_base64 needs to call member
> functions like write or put for every little piece of the content, or to
> use an itermediate buffer."
>
> It is not impossible to achieve good performance, page 68 of
> http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf list problems,
> which are solvable.
>
> In practice, increasing the buffer size helps and turning off
> synchronisation with stdio:
>
> https://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-better
> The SO answer lists several examples were C++'s iostreams beats C's stdio
> in performance.
>
> Your argument is also not convincing. Just calling member functions
> doesn't make something slow if you compile with optimisations, which is a
> must with C++. (...)
>

Thanks for the feedback. I removed that part from the docs.

I did some benchmarks. First I implemented a base64 encoder
using outbuf and std::streambuf and I couldn't find any
conclusive evidence that any one is faster than the other
( I get different results from seemingly irrelevant
code changes ). Then I implemented a simple json writer.
In this case the streambuf buffer was about 30% slower than
the outbuf version. Not a tremendous difference.

I choosed to write directly into streambuf instead of ostream
so that we can disconsider many of the possible QoI issues
related to std::ostream. That article you reference seems to
only address optimizations on facets usage and formatting,
which I think should not have any effect in these benchmarks.
That SO discussion seems to not apply either, since
the streambuf I used does not write into a file but
solely to char array.

The benchmark implementations are available at
https://github.com/robhz786/outbuf/tree/master/performance

Anyway, it's clear now that my statement is a bit reckless.

Best Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost