A solution to return dynamic strings without heap allocation. Any interest?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
Hi all,

I haven't been succesful at attracting interest in a formatting
library [1] I've been working lately. But recently I realized that
part of it could be isolated as a small standalone library that
could solve an old common troublesome situation in C++:

Suppose you need to create a function that returns/provides a string
whose content and size is unknown at compilation time. The first
approach is to make it return a `std::string`. But if it need to be
usable in environments like bare-metal real-time system,
then one usually makes it take a raw string as an output argument,
more or less like this:

  struct result{ char* it; bool truncated; };
  result get_message(char* dest, std::size_t dest_len);

But this is clearly not a perfect solution since there's nothing
really effective the caller can do when get_mesage fails because
of the destination string being is too small.

So I present the `outbut` abstract class. It somehow resembles
`std::streambuf`, but with a simpler and lower level design,
which is the result of many attempts looking for the best
performance [2] and usability in my formatting library.
Afaics, it does not require a hosted C++ implementation,
though I would like someone else to confirm that.

Now the caller of `get_message` has to choose or create a
suitable class type deriving from outbuf, that dictates where
the message is written to. For example, if the user wants to
get a `std::string`, then `string_maker` will do the job:

  #include <boost/outbuf/string.hpp>

  // ...
      boost::outbuf::string_maker<false> msg;
      get_message(msg)
      std::string str = msg.finish();

Or, if one wants it to write into a raw string,
then use `cstr_writer`

    char buff[buff_size];
    boost::outbuf::cstr_writer csw(buff, buff_size);
    get_message(csw);
    auto result = csw.finish();
    if (result.truncated) {
      // ...

Those `finish` functions above do not belong to `outbuf`.
They are defined in the concrete derived types only.
It's solely by convention that they share the same name.

Yes, using `string_maker` still leads to heap allocation
and `cstr_writer` to string truncation. The problem isn't
solved by these. However, a string object is never the final
destination. So the user could rather use another class
that writes the message directly into the final destination
( output console, a log file, an LCD display or whatever ).
It is not difficult to implement concrete subtypes of `outbuf`.

`outbuf` is actually a type alias:

    template <bool NoExcept, typename CharT>
    class basic_outbuf;

    using outbuf = basic_outbuf<false, char>;
    using outbuf_noexcept = basic_outbuf<true, char>;

That `NoExcept` template parameter is perhaps the controversial part.
It is not present originally in my formatting library.

Besides the destructor, `basic_outbuf` has only one virtual
function: `recycle()`, and it is declared as `noexcept(NoExcept)`.
This is the only effect the `NoExcept` template parameter has.
All other functions are guaranteed not to throw.

Hence, by taking a `outbuf_noexcept&` parameter, a function
states that the destination must not throw. That might be
particularly good if such object comes from another module
and we must avoid exceptions crossing modules boundaries.

On the other hand, if a function takes an `outbuf&`, then it also
accepts `outbuf_noexcept&`, because `basic_outbuf<true, CharT>&`
derives from `basic_outbuf<false, CharT>`.

When using `string_maker` you can choose between the two kinds.
`string_maker<true>` derives from `basic_outbuf<true, char>&`.
So if any exception raises from its internal `std::string`,
then it is caught by a try/catch(...) block, stored as
an `exception_ptr` and rethrown by `finish()`. This has the
undesirable effect of delaying its proper handling ( after all,
we rather stop what's being doing as soon as possible when
an error appears ). So I think the recommendation would be to use
`string_maker<false>&` if possible, and `string_maker<true>&`
only if necessary.

The reason why I think it's controversial is that it makes
`recycle()` violates the Lakos Rule. And although the Lakos Rule
is not a requirement in boost (afaik), I was wondering whether
this library could interest LEWG in future as well.
Anyway, this whole noexcept idea is not a central part of this
library and can it be removed.

Now, the other topic is how to implement that `get_message` function,
i.e., how to write into an `outbuf` object.
One can use `puts` and `putc` functions to insert string and chars.
One can also use `fmtlib` through an output iterator adapter.
Or one can write directly in the buffer.
But I will ask you to read the doc [3] for that. It's a quick read,
quicker than this email.

The repository is:

  https://github.com/robhz786/outbuf

I would prefer it to be part of Boost.Core instead of being
a standalone library, and also to remove the `outbuf`
namespace. But that is up to you.

Best regards,
Roberto

[1] The Stringify library:
    https://github.com/robhz786/stringify

[2] The great performance of Stringify is mainly thanks to
    the design of outbuf ( named there as `output_buffer` ):

https://robhz786.github.io/stringify/doc/html/benchmarks/benchmarks.html#benchmarks.benchmarks.run_time_performance

[3] "Writing into and outbuf object"

https://robhz786.github.io/outbuf/doc/html/index.html#boost_outbuf.overview.writting_into_an_outbuf_object

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On 8/29/19 3:24 PM, Roberto Hinz via Boost wrote:

>      auto result = csw.finish();
>      if (result.truncated) {
>        // ...

I suggest that you change the return type of finish() to contain
the number of element written instead of the truncated boolean.
This would make outbuf extensible to binary data as well where you
cannot rely on a terminating zero to tell you how much data has
been written.

The return type could also have begin()/end() member functions, which
makes it directly usable with STL algorithms. In that case it would
also make sense to have data()/size().

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Sun, Sep 1, 2019 at 7:45 AM Bjorn Reese via Boost <[hidden email]>
wrote:

> On 8/29/19 3:24 PM, Roberto Hinz via Boost wrote:
>
> >      auto result = csw.finish();
> >      if (result.truncated) {
> >        // ...
>
> I suggest that you change the return type of finish() to contain
> the number of element written instead of the truncated boolean.
> This would make outbuf extensible to binary data as well where you
> cannot rely on a terminating zero to tell you how much data has
> been written.
>
> The return type could also have begin()/end() member functions, which
> makes it directly usable with STL algorithms. In that case it would
> also make sense to have data()/size().
>

outbuf can be used to binary data, but the basic_cstr_writer class
in particular may not be suitable for that, since its finish() function
aways writes a terminating zero, requiring an extra space in the
destination string. Perhaps we could add another class template,
`basic_bin_writer`, that would never write a terminating character.

Anyway, the returned result contains a `ptr` member that points to
the end of the string. In order to add begin()/data()/size() functions
basic_cstr_writer would also need to store the initial position,
which would increase its size a little bit. And that's the only reason
why I did not add it, since I think it would not be used most of
time, and the caller already knows the begin anyway.
It's convenience vs tiny cost decision. I will go for the what the
majority prefers.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Thu, Aug 29, 2019 at 9:24 AM Roberto Hinz via Boost
<[hidden email]> wrote:
> I would prefer it to be part of Boost.Core instead of being
> a standalone library, and also to remove the `outbuf`
> namespace. But that is up to you.

Boost.Core is for Boost facilities used by other Boost libraries that
for simpler tasks.

You could propose it for Boost.Utility. However, it seems more worthy
of its own library.

In either choice (Utility, or your own library) the process for a
Boost formal review is at:
https://www.boost.org/community/reviews.html

Glen

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 9/1/19 3:10 PM, Roberto Hinz via Boost wrote:

> outbuf can be used to binary data, but the basic_cstr_writer class
> in particular may not be suitable for that, since its finish() function
> aways writes a terminating zero, requiring an extra space in the
> destination string. Perhaps we could add another class template,
> `basic_bin_writer`, that would never write a terminating character.

I was thinking more broadly than basic_cstr_write which I happened to
quote. I may want my template classes to operate on the return type
of any writer in a consistent manner. This is possible if they adhere
to ContiguousRange and SizedRange as suggested.

> Anyway, the returned result contains a `ptr` member that points to
> the end of the string. In order to add begin()/data()/size() functions
> basic_cstr_writer would also need to store the initial position,
> which would increase its size a little bit. And that's the only reason

Assuming the buffer is contiguous, begin == end - size, so there would
be no need to store an extra pointer.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Sun, Sep 1, 2019 at 11:59 AM Bjorn Reese via Boost <[hidden email]>
wrote:

> On 9/1/19 3:10 PM, Roberto Hinz via Boost wrote:
>
> > outbuf can be used to binary data, but the basic_cstr_writer class
> > in particular may not be suitable for that, since its finish() function
> > aways writes a terminating zero, requiring an extra space in the
> > destination string. Perhaps we could add another class template,
> > `basic_bin_writer`, that would never write a terminating character.
>
> I was thinking more broadly than basic_cstr_write which I happened to
> quote. I may want my template classes to operate on the return type
> of any writer in a consistent manner. This is possible if they adhere
> to ContiguousRange and SizedRange as suggested.
>

It might be quite a challenge to keep such consistency among writers.
Some of them write to file. There is one of them that doesn't actually
write anything, but just ignores all content ( the `discarded_outbuf`
which is documented I but forgot to implement ). What should finish()
return in those cases? And, of course, there are the user-defined
writers. We want the library to work in all sort of destination types.



> > Anyway, the returned result contains a `ptr` member that points to
> > the end of the string. In order to add begin()/data()/size() functions
> > basic_cstr_writer would also need to store the initial position,
> > which would increase its size a little bit. And that's the only reason
>
> Assuming the buffer is contiguous, begin == end - size, so there would
> be no need to store an extra pointer.
>

But then we need to store the size. See basic_cstr_writer implementation:
https://github.com/robhz786/outbuf/blob/59a6c8c3159eee94c0fcefed3dd52591dba26ee6/include/boost/outbuf.hpp#L425-L481

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
No one interested?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
Didn’t you see reply from Glen Fernandez? Check archive.

--
Janek Kozicki, PhD. DSc. Arch. Assoc. Prof.
Gdańsk University of Technology
Faculty of Applied Physics and Mathematics
Department of Theoretical Physics and Quantum Information
--
pg.edu.pl/jkozicki (click English flag on top right)
On 3 Sep 2019, 14:21 +0200, Roberto Hinz via Boost <[hidden email]>, wrote:
> No one interested?
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Tue, Sep 3, 2019 at 11:11 AM Janek Kozicki <[hidden email]>
wrote:

> Didn’t you see reply from Glen Fernandez? Check archive.
>

Well, yes, but I din't see his message as showing interest,
but just as a guidance.
And I presume I first need to check whether there is any interest,
otherwise what's the point of going any further.
Am I misunderstanding something?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Tue, Sep 3, 2019 at 5:21 AM Roberto Hinz via Boost
<[hidden email]> wrote:
> No one interested?

`std::basic_ostream` is actually quite usable once you figure out how
it works (which is admittedly more difficult than it should be). It
can be set up to not perform any memory allocations, depending on the
implementation of the derived class. It might not be perfect but it is
part of the standard library and thus has a natural advantage that
would require extraordinary functionality from an external component
to overcome. And I'm not seeing that in the proposed `outbuf`.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Tue, Sep 10, 2019 at 11:05 PM Vinnie Falco <[hidden email]>
wrote:

> `std::basic_ostream` is actually quite usable once you figure out how
> it works
>

Let me illustrate why I disagree with that. Suppose you want to
implement a base64 encoder. You want it to be fast, agnostic,
and simple to use. Now suppose you adopt `std::ostream` as
the destination type:

  void to_base64( std::ostream& dest, const std::byte* src, std::size_t
count );

You will face two issues:

1) It doesn't matter how well you (as the library author) understand
   basic_ostream. The *user* needs to implement derivates of
   basic_ostream to customize the destination types.

2) It's impossible to achieve a decent performance. If you used
   `outbuf` you could write directly into  the buffer. But with
   `std::ostream` you have to call member functions like `put`
   or `write` to for each little piece of the content, or to use an
   additional intermediate buffer.

And this is far to be a specific use case. The same issues apply
for any kind of encoding, binary or text.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Thu, Sep 12, 2019 at 10:21 AM Roberto Hinz <[hidden email]> wrote:
> Let me illustrate why I disagree with that...

Okay, these are two fair points, but then the title of the original
post is not accurate. What you're really proposing is "a better
std::ostream" which is an entirely different conversation.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Fri, Sep 13, 2019 at 9:46 AM Vinnie Falco <[hidden email]> wrote:

> On Thu, Sep 12, 2019 at 10:21 AM Roberto Hinz <[hidden email]> wrote:
> > Let me illustrate why I disagree with that...
>
> Okay, these are two fair points, but then the title of the original
> post is not accurate. What you're really proposing is "a better
> std::ostream" which is an entirely different conversation.
>
> Regards
>

You are right. Thanks for the comments

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Fri, Sep 13, 2019 at 6:50 AM Roberto Hinz <[hidden email]> wrote:
> You are right. Thanks for the comments

Please don't take any of these comments as discouragement. Quite the
opposite they are well intended with a goal of progress in mind. There
have been a chorus of voices clamoring for "a better std::ostream" and
it has been the subject of a few papers. Offering users better
versions or replacements for standard library types such as
std::ostream lands squarely within the purview of the Boost Libraries.
There is already precedent for this, such as
boost::system::error_category and boost::shared_ptr, both of which are
superior to their standard library equivalents.

However any proposed replacement needs to address the body of work
that has already been done in this area. What makes it better or more
usable?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: A solution to return dynamic strings without heap allocation. Any interest?

Boost - Dev mailing list
On Fri, Sep 13, 2019 at 11:42 AM Vinnie Falco <[hidden email]>
wrote:

> On Fri, Sep 13, 2019 at 6:50 AM Roberto Hinz <[hidden email]> wrote:
> > You are right. Thanks for the comments
>
> Please don't take any of these comments as discouragement. Quite the
> opposite they are well intended with a goal of progress in mind. There
> have been a chorus of voices clamoring for "a better std::ostream" and
> it has been the subject of a few papers. Offering users better
> versions or replacements for standard library types such as
> std::ostream lands squarely within the purview of the Boost Libraries.
> There is already precedent for this, such as
> boost::system::error_category and boost::shared_ptr, both of which are
> superior to their standard library equivalents.
>
> However any proposed replacement needs to address the body of work
> that has already been done in this area. What makes it better or more
> usable?
>
> Thanks
>

Not discoureged at all. I will enhance the documentation based on
your feedbacks and come back later.

Thank you

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost