Serialization and disctintion between eof and end of archive

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Serialization and disctintion between eof and end of archive

Maciej Sobczak
Hi,

The boost.serialization library seems to use eod-of-stream in the
underlying istream object to denote the end-of-archive.

This equivalence might make sense with files where the stream is open
for a short time and really associated with a single archive, but seems
to be cumbersome when used with streams that are supposed to be
long-lived (network sessions?) and used for transmission of many
separate archives.

The problem arises between two applications that want to use
serializatioin library for data exchange "on the fly", using network
sockets. Small tests have shown that it's not enough for the sender to
flush its output streams (although it does result in the archive's data
arriving at the destination side). For the archive to be read correctly,
the sender needs to entirely close the connection. This indicates that
the end-of-stream condition is used to denote end-of-archive in the
serialization sense.
Taking into account the interface of the serialization library (where
the readers are created from streams), where the stream object is
syntactically supposed to live longer than the archive, treating eof as
eoa is counterintuitive.
I really expect this to work for the receiver:

std::istream &is = ...; // some input stream, possibly long-lived

while (...)
{
     boost::archive::text_iarchive ar(is);
     // ...
}

with similar structure on the sender-side.

Any thoughts?

--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Oliver.Kowalke-2
Hello Maciej,

I was also faced this problem today.

>For the archive to be read correctly,
>the sender needs to entirely close the connection. This indicates that
>the end-of-stream condition is used to denote end-of-archive in the
>serialization sense.

I could solve this by appending an EOF to the archive in order to
indicate the peer that end-boundary of this message.

my_socket_stream os(...);
ar::text_oarchive oa( os);
oa << msg;
oa << EOF;

I hope this is correct.
Oliver

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Robert Ramey
In reply to this post by Maciej Sobczak
Maciej Sobczak wrote:
> Hi,
>
> The boost.serialization library seems to use eod-of-stream in the
> underlying istream object to denote the end-of-archive.

Why does it seem that way? It would certainly be contray to my
intention.

> This equivalence might make sense with files where the stream is open
> for a short time and really associated with a single archive,


> but seems to be cumbersome when used with streams that are supposed to be
> long-lived (network sessions?) and used for transmission of many
> separate archives.

I don't see that this would be a problem.  What is the matter with
the following?

?ostream os("pipename or whatever");

// first archive
{
    ?_oarchive oa(os);
    oa << ...;
} // archive is destroyed here - stream remains open and available
// second archive
{
    ?_oarchive oa(os);
    oa << ...;
} // archive is destroyed here - stream remains open and available

os.close();

>
> The problem arises between two applications that want to use
> serializatioin library for data exchange "on the fly", using network
> sockets. Small tests have shown that it's not enough for the sender to
> flush its output streams (although it does result in the archive's
> data arriving at the destination side). For the archive to be read
> correctly, the sender needs to entirely close the connection. This
> indicates that the end-of-stream condition is used to denote
> end-of-archive in the serialization sense.

I don't believe the conclusion follows.  The archive has to be
constructed and destroyed - but the stream doesn't have to be.

> Taking into account the interface of the serialization library (where
> the readers are created from streams), where the stream object is
> syntactically supposed to live longer than the archive, treating eof
> as eoa is counterintuitive.
> I really expect this to work for the receiver:
>
> std::istream &is = ...; // some input stream, possibly long-lived
>
> while (...)
> {
>     boost::archive::text_iarchive ar(is);
>     // ...
> }
>
> with similar structure on the sender-side.
>
> Any thoughts?

I also expect this to work

Robert Ramey



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end ofarchive

Robert Ramey
In reply to this post by Oliver.Kowalke-2
This shouldn't be necessary.

Robert Ramey

[hidden email] wrote:

> Hello Maciej,
>
> I was also faced this problem today.
>
>> For the archive to be read correctly,
>> the sender needs to entirely close the connection. This indicates
>> that the end-of-stream condition is used to denote end-of-archive in
>> the serialization sense.
>
> I could solve this by appending an EOF to the archive in order to
> indicate the peer that end-boundary of this message.
>
> my_socket_stream os(...);
> ar::text_oarchive oa( os);
> oa << msg;
> oa << EOF;
>
> I hope this is correct.
> Oliver
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost 



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Maciej Sobczak
In reply to this post by Robert Ramey
Hi,

Robert Ramey wrote:

>>The boost.serialization library seems to use eod-of-stream in the
>>underlying istream object to denote the end-of-archive.
>
> Why does it seem that way? It would certainly be contray to my
> intention.

It seems that way, because this is how it's shown by the example program
provided by Seweryn on the "users" list:

http://lists.boost.org/boost-users/2006/01/16646.php

I have experimented a bit with this program and I've found that when the
sender (the server in this case) flushes the stream, it's enough for the
data to arrive at the destination, and it can be retrieved by regular
stream read (the data is then identical as if the same archive was
written to cout in the first place). But when the text_iarchive object
is used to read it, it blocks. The only way to make it continue is to
close the stream on the server side. So it looks like the text_iarchive
is really waiting for eof (or for something else in the stream).


> I don't see that this would be a problem.  What is the matter with
> the following?
>
> ?ostream os("pipename or whatever");
>
> // first archive
> {
>     ?_oarchive oa(os);
>     oa << ...;
> } // archive is destroyed here - stream remains open and available
> // second archive
> {
>     ?_oarchive oa(os);
>     oa << ...;
> } // archive is destroyed here - stream remains open and available
>
> os.close();

The matter is that there might be hours of pause between these two
blocks above and the receiver might not want to wait that long. The
archive (the first one) should be succesfully read on the other end of
the wire as soon as the bytes make their way to the receiver.
It does not seem to be the case.


> I don't believe the conclusion follows.  The archive has to be
> constructed and destroyed - but the stream doesn't have to be.

Yes, but what about the reading part? Is it possible for the reader to
successfully read the first archive *before* the next archive arrives
(which can happen hours later)?

I hope to be mistaken, but my initial experiments with the OP's code led
me to the above considerations.

Regards,

--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Robert Ramey
Maciej Sobczak wrote:

> Hi,
>
> Robert Ramey wrote:
>
>>> The boost.serialization library seems to use eod-of-stream in the
>>> underlying istream object to denote the end-of-archive.
>>
>> Why does it seem that way? It would certainly be contray to my
>> intention.
>
> It seems that way, because this is how it's shown by the example
> program provided by Seweryn on the "users" list:
>
> http://lists.boost.org/boost-users/2006/01/16646.php

That example shows something entirely different.  It does not show
that the serialization code relies on eol to denote end of archive. A
cursory examination of the library source also should convince
anyone that serialization does not depend on end of stream in
anyway.


> I have experimented a bit with this program and I've found that when
> the sender (the server in this case) flushes the stream, it's enough
> for the data to arrive at the destination, and it can be retrieved by
> regular stream read (the data is then identical as if the same
> archive was written to cout in the first place). But when the
> text_iarchive object is used to read it, it blocks. The only way to
> make it continue is to close the stream on the server side. So it
> looks like the text_iarchive is really waiting for eof (or for
> something else in the stream).

It may look that way, but that's not what's happening.  I recomend
you investigate the management (or lack there of, flushing of the
underlying stream.

>
>
>> I don't see that this would be a problem.  What is the matter with
>> the following?
>>
>> ?ostream os("pipename or whatever");
>>
>> // first archive
>> {
>>     ?_oarchive oa(os);
>>     oa << ...;
>> } // archive is destroyed here - stream remains open and available
>> // second archive
>> {
>>     ?_oarchive oa(os);
>>     oa << ...;
>> } // archive is destroyed here - stream remains open and available
>>
>> os.close();
>
> The matter is that there might be hours of pause between these two
> blocks above and the receiver might not want to wait that long.

If that's the case, then the ?ostream streambuf implementation needs
enhancement.  It is outside the scope of the serialization library.

The
> archive (the first one) should be succesfully read on the other end of
> the wire as soon as the bytes make their way to the receiver.
> It does not seem to be the case.

It may not be - but it is not something that can be fixed from
within the serialization library.

>> I don't believe the conclusion follows.  The archive has to be
>> constructed and destroyed - but the stream doesn't have to be.

> Yes, but what about the reading part? Is it possible for the reader to
> successfully read the first archive *before* the next archive arrives
> (which can happen hours later)?

This would depend on the streambuf implementation used by the
underlying stream.  The serialization library library requests
all characters required - no more no less.

\
> I hope to be mistaken, but my initial experiments with the OP's code
> led me to the above considerations.

I think you are mistaken.

Robert Ramey

>
> Regards,



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end ofarchive

Oliver.Kowalke-2
In reply to this post by Robert Ramey
Hello Robert,
I've the same problem as Maciej.

You believe that this problem relies on the streambuf implementation - I
think it doesn't.
It doesn't matter if you use a buffered streambuf with flushing it or an
unbuffered streambuf - the archive blocks in basic_text_iprimitive.hpp,
line 80, 'is >> t'.

I don't know what do you mean with 'streambuf implementation needs
enhancement'? The streambuf reads n bytes from the socket or writes n
bytes to the socket as it was requested by its interface.

So it reads n bytes from the socket and writes n bytes to the socket as
it was passed to the stream which uses the socket_streambuf. It doesN#t
make assumptions about the outer context it is used.

In the read action it returns 0 if the peer has closed the socket and no
more data will be available - so you have to return std::char_traits<
char >::eof() in order to indicate the EOF of the stream.
If the socket (blocking mode) is still open and you try to read n bytes
from a socket you will be blocked until n bytes can be read from the
socket (peer has written at least n bytes to the socket).

I could verify that boost::archive blocks because it tries to read more
bytes than available.
The client writes an archive with 29 bytes (printed out from the
streambuf - value was '22 serialization::archive 3 0') to the socket in
one operation.
On the peer socket the archive tries to read 4096 bytes from the socket
(also printed out from streambuf operation) in one call.
So this must be block because you can only read 29 bytes from the
socket!

As you can see this problem is raised by the boost::archive.

Oliver

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Maciej Sobczak
In reply to this post by Robert Ramey
Robert Ramey wrote:

>>I have experimented a bit with this program and I've found that when
>>the sender (the server in this case) flushes the stream, it's enough
>>for the data to arrive at the destination, and it can be retrieved by
>>regular stream read (the data is then identical as if the same
>>archive was written to cout in the first place). But when the
>>text_iarchive object is used to read it, it blocks. The only way to
>>make it continue is to close the stream on the server side. So it
>>looks like the text_iarchive is really waiting for eof (or for
>>something else in the stream).
>
> It may look that way, but that's not what's happening.  I recomend
> you investigate the management (or lack there of, flushing of the
> underlying stream.

OK, after further investigation it appears that the archive reader is in
fact sensitive to one of these two:

- end of line
- end of stream

It is possible to reuse long-lived connection for sending many archives
(and receiving them without unnecessary waits), provided one of these
two happens. So, the sender might look like this:

?ostream outstream(...);

while (...)
{
     {
         boost::archive::text_oarchive ar(outstream);

         ar << myObject;
     }

     // this:
     outstream << std::endl;  // or '\n' followed by .flush()
}

With this additional newline+flush the receiver has no problems with
de-serializing the data as soon as they arrive.

Thank you for helping solving this.
Regards,

--
Maciej Sobczak : http://www.msobczak.com/
Programming    : http://www.msobczak.com/prog/
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Oliver.Kowalke-2
In reply to this post by Maciej Sobczak
>OK, after further investigation it appears that the archive reader is
in
>fact sensitive to one of these two:
>
>- end of line
>- end of stream

Hmmm - I didn't notice the end-of-line issue. Now it works. Please
ignore my previous email.

Regards,
Oliver

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Serialization and disctintion between eof and end of archive

Robert Ramey
In reply to this post by Maciej Sobczak
Actually this does reveal the true issue - and one that can and should
be addressed from withing the serialization library.

text archives output each value preceeded with a space to
separate the tokens. The space is used to delimit the data value.
the last is >> t is waiting for a space to return - it doesn't find it
untl the next archive.

So text archives should be terminated with a space or newline
to prevent this from happening.

I will add this to the text_oarchive destructor.  This should
address the problem.

Good work gentlemen,

Robert Ramey

Maciej Sobczak wrote:

> Robert Ramey wrote:
>
>>> I have experimented a bit with this program and I've found that when
>>> the sender (the server in this case) flushes the stream, it's enough
>>> for the data to arrive at the destination, and it can be retrieved
>>> by regular stream read (the data is then identical as if the same
>>> archive was written to cout in the first place). But when the
>>> text_iarchive object is used to read it, it blocks. The only way to
>>> make it continue is to close the stream on the server side. So it
>>> looks like the text_iarchive is really waiting for eof (or for
>>> something else in the stream).
>>
>> It may look that way, but that's not what's happening.  I recomend
>> you investigate the management (or lack there of, flushing of the
>> underlying stream.
>
> OK, after further investigation it appears that the archive reader is
> in fact sensitive to one of these two:
>
> - end of line
> - end of stream
>
> It is possible to reuse long-lived connection for sending many
> archives (and receiving them without unnecessary waits), provided one
> of these two happens. So, the sender might look like this:
>
> ?ostream outstream(...);
>
> while (...)
> {
>     {
>         boost::archive::text_oarchive ar(outstream);
>
>         ar << myObject;
>     }
>
>     // this:
>     outstream << std::endl;  // or '\n' followed by .flush()
> }
>
> With this additional newline+flush the receiver has no problems with
> de-serializing the data as soon as they arrive.
>
> Thank you for helping solving this.
> Regards,



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost