Serialization: hugo file size, binary archives

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Serialization: hugo file size, binary archives

Sascha Ochsenknecht
Hello,

I'm using the Serialization Library of Boost to store my data structure.
    I want to use the binary archive type by default:
boost::archive::binary_oarchive(ostream &s) // saving
boost::archive::binary_iarchive(istream &s) // loading

But I noticed that these files can be very big compared to the stored
data. I got a binary archive with around 1.5GByte. That could be but
when I compress it I got only ~200MByte left (!).
It seems that there is a lot of 'overhead' data or 'redundant' data (I
see a lot of '0' when I look into it with an Hex editor).

i tried the gzip (...) filter of the Iostreams library, but I want to
avoid this for production due to increasing runtime.

Some Information about my data structure (maybe helpful):
- using a lot of pointer
- using a lot of std::vector

Does anybody investigate the same problem?
Is there a possibility to decrease the archive size but storing the same
amount of data?
What could be a solution? Writing an own/optimized (regarding to my data
structure) Archive class?

thanks in advance
Sascha

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Serialization: hugo file size, binary archives

Robert Ramey
I'm am not aware of serialization causing such a problem.
You might investigate std::vector resize(), etc to see if
the vector really has a lot of null data.

Robert Ramey

Sascha Ochsenknecht wrote:

> Hello,
>
> I'm using the Serialization Library of Boost to store my data
>    structure. I want to use the binary archive type by default:
> boost::archive::binary_oarchive(ostream &s) // saving
> boost::archive::binary_iarchive(istream &s) // loading
>
> But I noticed that these files can be very big compared to the stored
> data. I got a binary archive with around 1.5GByte. That could be but
> when I compress it I got only ~200MByte left (!).
> It seems that there is a lot of 'overhead' data or 'redundant' data (I
> see a lot of '0' when I look into it with an Hex editor).
>
> i tried the gzip (...) filter of the Iostreams library, but I want to
> avoid this for production due to increasing runtime.
>
> Some Information about my data structure (maybe helpful):
> - using a lot of pointer
> - using a lot of std::vector
>
> Does anybody investigate the same problem?
> Is there a possibility to decrease the archive size but storing the
> same amount of data?
> What could be a solution? Writing an own/optimized (regarding to my
> data structure) Archive class?
>
> thanks in advance
> Sascha



_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Serialization: hugo file size, binary archives

Sascha Ochsenknecht
Hello,

First of all, thanks for the quick reply.

I tried std::vector resize(), but the problem still exist. I also
removed some redundant data from my data structure (its data which can
be generated by a postprocess after reading the archive). I still have
about 1.0GByte.

Is there somewere a doc were 'overhead' data is documented?
Would it be helpful if I send you a generated archive (with other data I
can generate uncompressed archives around 1 MByte)? Would somebody have
a look on it.

Another thing ... I get a lot of compile warnings about unused variables
within the serialization library. Would be nice if these can be fixed
with the next release.

Thanks in advance.

Sascha

Robert Ramey wrote:

> I'm am not aware of serialization causing such a problem.
> You might investigate std::vector resize(), etc to see if
> the vector really has a lot of null data.
>
> Robert Ramey
>
> Sascha Ochsenknecht wrote:
>> Hello,
>>
>> I'm using the Serialization Library of Boost to store my data
>>    structure. I want to use the binary archive type by default:
>> boost::archive::binary_oarchive(ostream &s) // saving
>> boost::archive::binary_iarchive(istream &s) // loading
>>
>> But I noticed that these files can be very big compared to the stored
>> data. I got a binary archive with around 1.5GByte. That could be but
>> when I compress it I got only ~200MByte left (!).
>> It seems that there is a lot of 'overhead' data or 'redundant' data (I
>> see a lot of '0' when I look into it with an Hex editor).
>>
>> i tried the gzip (...) filter of the Iostreams library, but I want to
>> avoid this for production due to increasing runtime.
>>
>> Some Information about my data structure (maybe helpful):
>> - using a lot of pointer
>> - using a lot of std::vector
>>
>> Does anybody investigate the same problem?
>> Is there a possibility to decrease the archive size but storing the
>> same amount of data?
>> What could be a solution? Writing an own/optimized (regarding to my
>> data structure) Archive class?
>>
>> thanks in advance
>> Sascha

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Serialization: hugo file size, binary archives

Robert Ramey
There is no information written into a binary archive which is not
absolutly necessary.  That is there is not redundant information.
If you archives are "too" big there must be some mistake.  I would
suggest that you output (part of) the archive using text or xml format
so you can see what is actually being written and how it differs
from what you expect.

Robert Ramey

Sascha Ochsenknecht wrote:

> Hello,
>
> First of all, thanks for the quick reply.
>
> I tried std::vector resize(), but the problem still exist. I also
> removed some redundant data from my data structure (its data which can
> be generated by a postprocess after reading the archive). I still have
> about 1.0GByte.
>
> Is there somewere a doc were 'overhead' data is documented?
> Would it be helpful if I send you a generated archive (with other
> data I can generate uncompressed archives around 1 MByte)? Would
> somebody have a look on it.
>
> Another thing ... I get a lot of compile warnings about unused
> variables within the serialization library. Would be nice if these
> can be fixed with the next release.
>
> Thanks in advance.
>
> Sascha
>
> Robert Ramey wrote:
>> I'm am not aware of serialization causing such a problem.
>> You might investigate std::vector resize(), etc to see if
>> the vector really has a lot of null data.
>>
>> Robert Ramey
>>
>> Sascha Ochsenknecht wrote:
>>> Hello,
>>>
>>> I'm using the Serialization Library of Boost to store my data
>>>    structure. I want to use the binary archive type by default:
>>> boost::archive::binary_oarchive(ostream &s) // saving
>>> boost::archive::binary_iarchive(istream &s) // loading
>>>
>>> But I noticed that these files can be very big compared to the
>>> stored data. I got a binary archive with around 1.5GByte. That
>>> could be but when I compress it I got only ~200MByte left (!).
>>> It seems that there is a lot of 'overhead' data or 'redundant' data
>>> (I see a lot of '0' when I look into it with an Hex editor).
>>>
>>> i tried the gzip (...) filter of the Iostreams library, but I want
>>> to avoid this for production due to increasing runtime.
>>>
>>> Some Information about my data structure (maybe helpful):
>>> - using a lot of pointer
>>> - using a lot of std::vector
>>>
>>> Does anybody investigate the same problem?
>>> Is there a possibility to decrease the archive size but storing the
>>> same amount of data?
>>> What could be a solution? Writing an own/optimized (regarding to my
>>> data structure) Archive class?
>>>
>>> thanks in advance
>>> Sascha



_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Serialization: hugo file size, binary archives

Sascha Ochsenknecht
Thanks for support, now I see the rootcause of my huge archive files.
I reached a very good performance now (runtime and file size).

Best regards,
Sascha

Robert Ramey wrote:

> There is no information written into a binary archive which is not
> absolutly necessary.  That is there is not redundant information.
> If you archives are "too" big there must be some mistake.  I would
> suggest that you output (part of) the archive using text or xml format
> so you can see what is actually being written and how it differs
> from what you expect.
>
> Robert Ramey
>
> Sascha Ochsenknecht wrote:
>> Hello,
>>
>> First of all, thanks for the quick reply.
>>
>> I tried std::vector resize(), but the problem still exist. I also
>> removed some redundant data from my data structure (its data which can
>> be generated by a postprocess after reading the archive). I still have
>> about 1.0GByte.
>>
>> Is there somewere a doc were 'overhead' data is documented?
>> Would it be helpful if I send you a generated archive (with other
>> data I can generate uncompressed archives around 1 MByte)? Would
>> somebody have a look on it.
>>
>> Another thing ... I get a lot of compile warnings about unused
>> variables within the serialization library. Would be nice if these
>> can be fixed with the next release.
>>
>> Thanks in advance.
>>
>> Sascha
>>
>> Robert Ramey wrote:
>>> I'm am not aware of serialization causing such a problem.
>>> You might investigate std::vector resize(), etc to see if
>>> the vector really has a lot of null data.
>>>
>>> Robert Ramey
>>>
>>> Sascha Ochsenknecht wrote:
>>>> Hello,
>>>>
>>>> I'm using the Serialization Library of Boost to store my data
>>>>    structure. I want to use the binary archive type by default:
>>>> boost::archive::binary_oarchive(ostream &s) // saving
>>>> boost::archive::binary_iarchive(istream &s) // loading
>>>>
>>>> But I noticed that these files can be very big compared to the
>>>> stored data. I got a binary archive with around 1.5GByte. That
>>>> could be but when I compress it I got only ~200MByte left (!).
>>>> It seems that there is a lot of 'overhead' data or 'redundant' data
>>>> (I see a lot of '0' when I look into it with an Hex editor).
>>>>
>>>> i tried the gzip (...) filter of the Iostreams library, but I want
>>>> to avoid this for production due to increasing runtime.
>>>>
>>>> Some Information about my data structure (maybe helpful):
>>>> - using a lot of pointer
>>>> - using a lot of std::vector
>>>>
>>>> Does anybody investigate the same problem?
>>>> Is there a possibility to decrease the archive size but storing the
>>>> same amount of data?
>>>> What could be a solution? Writing an own/optimized (regarding to my
>>>> data structure) Archive class?
>>>>
>>>> thanks in advance
>>>> Sascha

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Serialization: hugo file size, binary archives

Bruno Martínez Aguerre
On Mon, 10 Apr 2006 17:11:33 -0300, Sascha Ochsenknecht  
<[hidden email]> wrote:

> Thanks for support, now I see the rootcause of my huge archive files.
> I reached a very good performance now (runtime and file size).

I'm curious, what was it?

Bruno

     Ahora podes ir volando sin usar el telefono
       NUEVO ADSL 256K sin limites por $ 890 !!
    y los primeros cuatro meses a mitad de precio
______________________________________________________
http://www.internet.com.uy - En Uruguay somos internet



     Ahora podes ir volando sin usar el telefono
       NUEVO ADSL 256K sin limites por $ 890 !!
    y los primeros cuatro meses a mitad de precio
______________________________________________________
http://www.internet.com.uy - En Uruguay somos internet


_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users
Reply | Threaded
Open this post in threaded view
|

Re: Serialization: hugo file size, binary archives

Sascha Ochsenknecht
Hi Bruno,

I made a more detailed investigation of my data structure and the
examples that I used and came to the result that the size was ok.

Sascha

Bruno Martínez wrote:

> On Mon, 10 Apr 2006 17:11:33 -0300, Sascha Ochsenknecht  
> <[hidden email]> wrote:
>
>> Thanks for support, now I see the rootcause of my huge archive files.
>> I reached a very good performance now (runtime and file size).
>
> I'm curious, what was it?
>
> Bruno
>
>      Ahora podes ir volando sin usar el telefono
>        NUEVO ADSL 256K sin limites por $ 890 !!
>     y los primeros cuatro meses a mitad de precio
> ______________________________________________________
> http://www.internet.com.uy - En Uruguay somos internet
>
>
>
>      Ahora podes ir volando sin usar el telefono
>        NUEVO ADSL 256K sin limites por $ 890 !!
>     y los primeros cuatro meses a mitad de precio
> ______________________________________________________
> http://www.internet.com.uy - En Uruguay somos internet

_______________________________________________
Boost-users mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-users