C++ Deflate (zlib-like) library

classic Classic list List threaded Threaded
59 messages Options
123
Reply | Threaded
Open this post in threaded view
|

C++ Deflate (zlib-like) library

Boost - Dev mailing list
Hello everyone,

I have recently been working on a C++ compression library very similar
to zlib after trying to implement some HTTP compression support over
Boost.Beast and realizing after some discussion with sir Falco that
while it would be a nice builtin feature for Beast, it would possibly be
a better idea to have zlib-like compression be a separate library in
order to be properly maintainable and likely more useful.
The current working version can be viewed at
https://github.com/ryanjanson/Deflate

however the API could still be helped in terms of modern C++ usability,
and what I currently have in mind can be found annotated here:
https://gist.github.com/AeroStun/687ec9ca69404e26f8e02e5084926036

Do you thing that this could be useful to have as its own entity in the
Boost environment? Any kind of feedback on the idea and the library is
warmly welcome.


Regards,
Janson

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
On Fri, 6 Mar 2020 at 05:51, Janson R. via Boost <[hidden email]>
wrote:

> Hello everyone,
>
> I have recently been working on a C++ compression library very similar
> to zlib after trying to implement some HTTP compression support over
> Boost.Beast and realizing after some discussion with sir Falco that
> while it would be a nice builtin feature for Beast, it would possibly be
> a better idea to have zlib-like compression be a separate library in
> order to be properly maintainable and likely more useful.
>

Why not use lzma(2)? (wasn't there already (wrapped) support for this in
Boost?) If you need just in-memory on the fly (de-)compression, for
streaming f.e.: lz4 is in **very** active development. R-y-o seems like a
waste of dev-time, but it will keep you of the street.

It is the wrong direction for Boost to start offering all those (basic)
things next to the 'real thing', on Windows this won't be an enormous
problem as the fact that it comes in the (Boost) package possibly outweighs
the possible resistance for adoption, on linux (and BSD and probably OSX),
however, by-passing the normal (distro-supplied) packages used for these
purposes increases complexity as opposed to reducing it. Additionally,
there are a lot, really a lot, of devs/users that have their eye on the
ball, a known bug won't last long and we don't need to wait for Boost to go
through another release-cycle. The latter is not helpful either, because
corporates will not move immediately to this new release, so it will take
even longer.

degski
--
@systemdeg
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 12:50 PM Janson R. via Boost
<[hidden email]> wrote:
> I have recently been working on a C++ compression library very similar
> to zlib after trying to implement some HTTP compression support over
> Boost.Beast and realizing after some discussion with sir Falco that
> while it would be a nice builtin feature for Beast, it would possibly be
> a better idea to have zlib-like compression be a separate library in
> order to be properly maintainable and likely more useful.

> Do you thing that this could be useful to have as its own entity in the
> Boost environment? Any kind of feedback on the idea and the library is
> warmly welcome.

(As someone who had to recently peek into the zlib C source code...)

Hi. I'd very much welcome a clean pure C++ implementation of basic
deflate compression,
because the C code I saw did not give me a warm and buzzy feeling, honestly.

I need access to ZIP files, and since minizip and zlib are pretty much
intertwined,
as far as I saw, I couldn't easily use this new library, if it lacked
ZIP support. And if
that support allowed efficient *and* multi-threaded access to the ZIP
entries, all the
better. The raw file IO can still be serial, but at the very least the
compression /
decompression should be able to run in parallel on multiple threads, via ASIO.

I'm dealing with ZIP files which reach into the 100,000s to millions of entries,
and having to serially read+uncompress or compress+write per entry is slow!
All those cores could be put to good use.

Lastly, a quick glance at the code showed plain enums vs enum classes,
capitalized vs all-lowercased enums; such naming inconsistencies was surprising.
Perhaps it's from "consistency" with zlib? Not sure it's a good idea.
Pick a style and
stick to it IMHO.

Thanks, --DD

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
First of all, thanks for the feedback

On 06/03/2020 15:32, degski via Boost wrote:
>
> Why not use lzma(2)? (wasn't there already (wrapped) support for this in
> Boost?) If you need just in-memory on the fly (de-)compression, for
> streaming f.e.: lz4 is in **very** active development. R-y-o seems like a
> waste of dev-time, but it will keep you of the street.
>

The point is to facilitate the use of all the formats which use Deflate
such as ZIP, PNG, or PDF, for pure C++ projects and the Boost
environment. As for current wrapped support, there is in
Boost.Iostreams, but if I read the docs correctly it does not support
raw Deflate (RFC1951) which is problematic for formats or protocols
which rely on it directly, such as WebSockets' per-message Deflate, or
decompressing compressed data from a Microsoft HTTP server (which are
non-compliant and use RFC1951 instead of RFC1950). Furthermore, it
doesn't allow much room for customization of the engine parameters (no
predefined dictionary for example).


Janson

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
On Fri, Mar 6, 2020 at 5:15 PM Janson R. via Boost
<[hidden email]> wrote:
> [...] customization of the engine parameters (no predefined dictionary for example).

You mean something similar to https://facebook.github.io/zstd/#small-data ?

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 6:33 AM degski via Boost <[hidden email]> wrote:
> Why not use lzma(2)?

This is not a question of implementation but a question of what a
clean, Boost-quality, modern C++ API to a codec that operates on
memory buffers looks like. The stock ZLib API is very C-oriented
(obviously). Can we do better?

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 06/03/2020 18:26, Dominique Devienne via Boost wrote:
> You mean something similar to https://facebook.github.io/zstd/#small-data ?

Yes indeed.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
I really would appreciate if the library could be used as header only, and
contained some sort of abbreviation of these convenience functions:
auto compress(const std::vector<uint8_t>& uncompressed) ->
std::vector<uint8_t>;
auto decompress(const std::vector<uint8_t>& compressed) ->
std::optional<std::vector<uint8_t>>;

/Viktor



On Fri, Mar 6, 2020 at 12:50 PM Janson R. via Boost <[hidden email]>
wrote:

> Hello everyone,
>
> I have recently been working on a C++ compression library very similar
> to zlib after trying to implement some HTTP compression support over
> Boost.Beast and realizing after some discussion with sir Falco that
> while it would be a nice builtin feature for Beast, it would possibly be
> a better idea to have zlib-like compression be a separate library in
> order to be properly maintainable and likely more useful.
> The current working version can be viewed at
> https://github.com/ryanjanson/Deflate
>
> however the API could still be helped in terms of modern C++ usability,
> and what I currently have in mind can be found annotated here:
> https://gist.github.com/AeroStun/687ec9ca69404e26f8e02e5084926036
>
> Do you thing that this could be useful to have as its own entity in the
> Boost environment? Any kind of feedback on the idea and the library is
> warmly welcome.
>
>
> Regards,
> Janson
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
On Fri, Mar 6, 2020 at 10:03 AM Viktor Sehr via Boost
<[hidden email]> wrote:
> I really would appreciate if the library could be used as header only

I'll take it one step farther. The library should:

* Default to compilation into a static or dynamic lib, e.g. libboost_deflate.o
* Compile header-only, by defining BOOST_DEFLATE_HEADER_ONLY
* Require only C++11
* Compile without the rest of Boost (i.e. no dependencies), by defining
  BOOST_DEFLATE_STANDALONE. In this configuration, C++17 or later
  will be required. The boost:: namespace will remain.
* Configurably support C++ equivalents of Boost types such as
string_view and optional.

All of my new libraries follow this pattern. To assist in building
such libraries, I have created a repository "library_template" which
has a trivial function, that serves as a template which anyone can
clone to form the starting point of a Boost library meeting these
requirements. It has Bjam and Boost-compatible CMake support, tests,
examples, coverage, sanitizers, CI (travis, appveyor, azure), and
working badges:

<https://github.com/vinniefalco/library_template>

> some sort of abbreviation of these convenience functions:
> auto compress(const std::vector<uint8_t>& uncompressed) ->
> std::vector<uint8_t>;
> auto decompress(const std::vector<uint8_t>& compressed) ->
> std::optional<std::vector<uint8_t>>;

Yes thank you, this is precisely what the OP was asking. I agree
having convenience functions is great, something like this too:

    std::string compress( string_view s );

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 6:33 AM degski via Boost <[hidden email]> wrote:
> Why not use lzma(2)?

Anyway, I looked at the native API for lzma and typical usage looks like this:

    rc = elzma_compress_config(hand, ELZMA_LC_DEFAULT,
                               ELZMA_LP_DEFAULT, ELZMA_PB_DEFAULT,
                               5, (1 << 20) /* 1mb */,
                               format, inLen);
    ...
    rc = elzma_compress_run(hand, inputCallback, (void *) &ds,
        outputCallback, (void *) &ds);

In other words, the same type of shitty C API found in ZLib - no thanks.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 7:48 AM Dominique Devienne via Boost
<[hidden email]> wrote:
> Hi. I'd very much welcome a clean pure C++ implementation of basic
> deflate compression, because the C code I saw did not give me a warm
> and buzzy feeling, honestly.

Yep, that's the goal. And we also want robust tests which cover corner
cases and known bugs/fixes, along with 100% coverage.

> I couldn't easily use this new library, if it lacked ZIP support.

Yes, ZIP, gZip, and other flavors of deflate (which really only differ
in the additional material prepended or appended to the compressed
data) should be supported, with a clean API.

> And if
> that support allowed efficient *and* multi-threaded access to the ZIP
> entries, all the better. The raw file IO can still be serial, but at the very least the
> compression / decompression should be able to run in parallel on multiple threads, via ASIO.

Now this is a bridge too far :) I don't think we need to get Asio
involved here. However, we should ensure that the interface we settle
on does not present an obstacle to a third party implementing the
parallel algorithm you describe on top of the deflate algorithm.

> Lastly, a quick glance at the code showed plain enums vs enum classes,
> capitalized vs all-lowercased enums; such naming inconsistencies was surprising.
> Perhaps it's from "consistency" with zlib? Not sure it's a good idea.
> Pick a style and stick to it IMHO.

The ZLib in Beast (upon which this new project is based) is
unfinished. It does work though, and is used for the
permessage-deflate extension of Beast websocket. Beast users don't
have to deal with the hassle of having a separate zlib dependency, so
it has achieved its goal in that sense. However I did not put all of
the polish and design work into it that it needs as I am only one
person. I did port it to header-only C++ though, if you have a look
you can see that it is considerably different from the original ZLib,
with no small effort. It can be further improved.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 9:23 AM Dominique Devienne via Boost
<[hidden email]> wrote:
> https://facebook.github.io/zstd/#small-data ?

Here I will quote some of the function declarations from this library:

    size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter
param, int value);
    size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long
long pledgedSrcSize);
    size_t ZSTD_compress2( ZSTD_CCtx* cctx,
                       void* dst, size_t dstCapacity, const void* src,
size_t srcSize);

As with lzma2, this is just another variation of the same shitty C API
used by ZLib. Is that the best we can do for C++? I sure hope not...

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Vinnie Falco wrote:

> * Default to compilation into a static or dynamic lib, e.g.
> libboost_deflate.o
> * Compile header-only, by defining BOOST_DEFLATE_HEADER_ONLY

Practice shows that this isn't that convenient for a low-level dependency.
Inevitably, header-only library A wants to use it header-only, and library B
wants to use it as a compiled library, and things go south pretty quickly.

We already hit this scenario in
https://github.com/boostorg/timer/commit/10bf0e3d6d79e53a79f8d9e56991f855af862f45.
(See also
https://github.com/boostorg/timer/commit/05ae7c47e99038c5f777c9682980d6d7f5d2b768.)


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
On Fri, Mar 6, 2020 at 10:32 AM Peter Dimov via Boost
<[hidden email]> wrote:
> Practice shows that this isn't that convenient for a low-level dependency.
> Inevitably, header-only library A wants to use it header-only, and library B
> wants to use it as a compiled library, and things go south pretty quickly.
>
> We already hit this scenario in
> https://github.com/boostorg/timer/commit/10bf0e3d6d79e53a79f8d9e56991f855af862f45.
> (See also
> https://github.com/boostorg/timer/commit/05ae7c47e99038c5f777c9682980d6d7f5d2b768.)

That's not applicable to the deflate use-case, because the deflate
library has no dependencies.

If I did write a library which had a dependency on another library
which offered a header-only option, then yes it would be a mistake (if
not downright rude and presumptuous) to dictate to the user how the
downstream dependency must be consumed (header-only or linked
library).

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
Vinnie Falco wrote:

> That's not applicable to the deflate use-case, because the deflate library
> has no dependencies.

It's about libraries that depend on it.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 06/03/2020 19:02, Viktor Sehr via Boost wrote:
> I really would appreciate if the library could be used as header only, and
> contained some sort of abbreviation of these convenience functions:
> auto compress(const std::vector<uint8_t>& uncompressed) ->
> std::vector<uint8_t>;
> auto decompress(const std::vector<uint8_t>& compressed) ->
> std::optional<std::vector<uint8_t>>;


Good thing the library is already header only; and thanks for the
suggestion which I will definitely include some closely resembling
version of because it makes total sense.

Janson

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 06/03/2020 20:02, Peter Dimov via Boost wrote:
> It's about libraries that depend on it.

IMO libraries that depend on it would rather have the choice of having
it as header-only or compiled and deal with the consequencea of their
choice themselves rather than being forced onto a model.

Janson

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Le vendredi 06 mars 2020 à 16:51 +0100, Dominique Devienne via Boost a
écrit :

> On Fri, Mar 6, 2020 at 12:50 PM Janson R. via Boost
> <[hidden email]> wrote:
> > I have recently been working on a C++ compression library very
> > similar
> > to zlib after trying to implement some HTTP compression support
> > over
> > Boost.Beast and realizing after some discussion with sir Falco that
> > while it would be a nice builtin feature for Beast, it would
> > possibly be
> > a better idea to have zlib-like compression be a separate library
> > in
> > order to be properly maintainable and likely more useful.
> > Do you thing that this could be useful to have as its own entity in
> > the
> > Boost environment? Any kind of feedback on the idea and the library
> > is
> > warmly welcome.
>
> (As someone who had to recently peek into the zlib C source code...)
>
> Hi. I'd very much welcome a clean pure C++ implementation of basic
> deflate compression,
> because the C code I saw did not give me a warm and buzzy feeling,
> honestly.

+1. Same feeling here. A decent c++ api for dealing with zlib
compression, and on top of that zip format would indeed be very useful.
Current options sucks (about 15 years ago, i had to develop (closed)
code to read and write some zip files. I ran into the same need a few
months ago, just to see nothing has improved in this regard).

Regards,

Julien


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 1:54 PM Janson R. via Boost
<[hidden email]> wrote:
> Good thing the library is already header only;

Yeah, note though that the default should be a linkable library (i.e.
you have to opt-in to header only by defining a macro,
BOOST_DEFLATE_HEADER_ONLY in this case).

Thanks

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: C++ Deflate (zlib-like) library

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Fri, Mar 6, 2020 at 2:11 PM Janson R. via Boost
<[hidden email]> wrote:
>
> On 06/03/2020 20:02, Peter Dimov via Boost wrote:
> > It's about libraries that depend on it.
>
> IMO libraries that depend on it would rather have the choice of having
> it as header-only or compiled and deal with the consequencea of their
> choice themselves rather than being forced onto a model.

I think Peter is right in this case. Library A which uses library B
should not have an opinion on whether library B is consumed as
header-only or as a linked library. The decision on how each library
in a linked executable is configured should be up to the top-level
build target (i.e. the program) and not any of the individual
components. This is why the default for libraries which have a
header-only configuration, should be a linkable library target, since
it creates the least headache.

A header-only configuration option is only provided to capture that
subset of users who don't want the hassle of integrating another
linked library into their build, for whatever reason. It should not be
the default, and it should not be encouraged as the status-quo.

Regards

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
123