Using serialization for replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Using serialization for replication

Preston A. Elder-2
Hey,

I want to use serialization for some kind of active replication,
however the biggest barrier to this is the fact that serialization
does not allow me to put the same object into the stream twice.  To
be more precise, I want to be able to serialize both to a network and
to disk, and I very much like the elegance of the serialization
approach (plus the fact it can re-create pointer references, arrays,
and such).

So my questions are:
a) How difficult would it be to be able to allow an object to be
serialized twice, where the second copy would more or less do an
operator= (instead of creating a new object) on the first copy on
deserialization.
b) How difficult would it be to have more or less an 'appending'
serialization stream - ie. I deserialize what I have previously stored
on disk, and then continue to append to the serialization stream (which
appends to the file from then on), making my serialization more or less
unbounded.
c) Is there currently an unbounded serialization stream?  I mean,
obviously the XML stream will not be 'complete' until you close the
serialization stream and it can append any close tags it needs to,
however if I wanted to continually write to a stream throughout the
time my program is running, and if it crashes, be able to use that to
get back to the state I was in, is it possible?

I realize that in some respects this is kind of hammering a square peg
into a round hole, since serialization seems to have been designed for
'one-short rights', but serialization and replication type
functionality is very similar in nature.

--
PreZ :)

Death is life's way of telling you you've been fired.
                -- R. Geis

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Robert Ramey
I'm not really sure what you want to do but I'll attempt to answer anyhow.


[hidden email] wrote:
> Hey,
>
> I want to use serialization for some kind of active replication,

It would seem to me that the easiest way to do this would be
to make a TEE type streambuf using the stream buffer library.
This would duplicate each write to an additional stream.

> however the biggest barrier to this is the fact that serialization
> does not allow me to put the same object into the stream twice.  To
> be more precise, I want to be able to serialize both to a network and
> to disk, and I very much like the elegance of the serialization
> approach (plus the fact it can re-create pointer references, arrays,
> and such).

I believe the above would cover this.

> b) How difficult would it be to have more or less an 'appending'
> serialization stream - ie. I deserialize what I have previously stored
> on disk, and then continue to append to the serialization stream
> (which appends to the file from then on), making my serialization
> more or less unbounded.

I think something like that can be done now and is in fact already
being done.  I know the somepeople are "embedding" serialization
data inside of other data by just passing the streambuffer around
without closing it.

> c) Is there currently an unbounded serialization stream?  I mean,
> obviously the XML stream will not be 'complete' until you close the
> serialization stream and it can append any close tags it needs to,
> however if I wanted to continually write to a stream throughout the
> time my program is running, and if it crashes, be able to use that to
> get back to the state I was in, is it possible?

I believe that using "no_header" on opening might get you what you
want.

Robert Ramey



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Preston A. Elder-2
Robert Ramey <[hidden email]> wrote:
> I'm not really sure what you want to do but I'll attempt to answer anyhow.

What I'm trying to do is be able to take live objects, and send them to
disk and/or another application every time it changes (including being
created).  The idea being that both another application is kept in sync
with the first, and that if the application goes down, I can replay
the disk version and when the replay is done, the application will be
at the same state it was before it went down.

>> I want to use serialization for some kind of active replication,
>
> It would seem to me that the easiest way to do this would be
> to make a TEE type streambuf using the stream buffer library.
> This would duplicate each write to an additional stream.
Do you mean with boost::iostream? or as a part of boost::serialization?

>> however the biggest barrier to this is the fact that serialization
>> does not allow me to put the same object into the stream twice.  To
>> be more precise, I want to be able to serialize both to a network and
>> to disk, and I very much like the elegance of the serialization
>> approach (plus the fact it can re-create pointer references, arrays,
>> and such).
>
> I believe the above would cover this.
TEE would handle going to both network and disk, but it would not
obviate the 'single object only once' problem.  According to the
serialization documentation (Reference -> Speical Considerations ->
Object Tracking
(http://www.boost.org/libs/serialization/doc/special.html#objecttracking)),
an object may only be put on the stream once, I cannot put an object
that has been changed on the stream again to be re-serialized (either
by replacing the previously serialized entry, or adding it to be
serialized again, but without allocating a new object).

This is why I was asking in the first place how difficult the
modifications would be to allow an object to be serialized twice, and
serialization to understand this and not create a separate instance,
but just update the existing instance.

Thanks for your help :)

--
PreZ :)

Death is life's way of telling you you've been fired.
                -- R. Geis

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Robert Ramey
Preston A. Elder wrote:
>> It would seem to me that the easiest way to do this would be
>> to make a TEE type streambuf using the stream buffer library.
>> This would duplicate each write to an additional stream.
> Do you mean with boost::iostream? or as a part of
> boost::serialization?

Remember that serialization uses streambuf for doing the actual i/o.
Hence any thing that streambuf (boost streams) implements such as
compression, duplicaiton, etc, is "inherited" by boost serialization.

>>> however the biggest barrier to this is the fact that serialization
>>> does not allow me to put the same object into the stream twice.  To
>>> be more precise, I want to be able to serialize both to a network
>>> and to disk, and I very much like the elegance of the serialization
>>> approach (plus the fact it can re-create pointer references, arrays,
>>> and such).
>>
>> I believe the above would cover this.
> TEE would handle going to both network and disk, but it would not
> obviate the 'single object only once' problem.  According to the
> serialization documentation (Reference -> Speical Considerations ->
> Object Tracking

This would be done by setting the serialization trait "tracking" to
track_never.  This would inhibit the checking for duplicates.  This
would occur before it gets to the stream buf implementation

> (http://www.boost.org/libs/serialization/doc/special.html#objecttracking)),
> an object may only be put on the stream once, I cannot put an object
> that has been changed on the stream again to be re-serialized (either
> by replacing the previously serialized entry, or adding it to be
> serialized again, but without allocating a new object).

that's what "track_never" is for.

> This is why I was asking in the first place how difficult the
> modifications would be to allow an object to be serialized twice, and
> serialization to understand this and not create a separate instance,
> but just update the existing instance.


So track_never permits the object to be written multiple times.

Using a custom streambuf would place the serialized output into multiple
streams.

Robert Ramey



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Preston A. Elder-2
Robert Ramey <[hidden email]> wrote:
> This would be done by setting the serialization trait "tracking" to
> track_never.  This would inhibit the checking for duplicates.  This
> would occur before it gets to the stream buf implementation
This has one other side-effect though.  It also means I cannot have
pointer references.

Consider, say, a tree - where nodes have pointers to the next entry,
parent entry, and first child entry.  I want to be able to pass a node,
and have those pointers re-established, just as it would with tracking
on.  However I also want to be able to re-pass a node if it gets
updated (eg. if its a 'node with data' or even if I now have a new
first child or a new next sibling).  Assume for this case, that a node
is serialized on creation, and its pointers will either be NULL or
refer to previously serialized objects.

Turning off tracking means it does not know how to re-establish those
links, and I'd end up with duplicate copies of the same node.  As
previously mentioned - being able to serialize the same object over and
over is only part of replication - the real core is to be able to
UPDATE a node that has been previously serialized, without having to
a) de-serialize a new object, lookup the object and then operator=.
and b) forego being able to serialize a pointer to that class-type and
have it realize it has already seen that class and thus just make it
point to the same thing.

Right now, I'm going to hack my way around it by having tracking turned
on, so my pointers get re-established, and create a derived class that
exists just to create a new type that I can disable tracking for (since
using a typedef will not work).  The idea being that if the object has
already been serialized once, I'll follow up the object with a derived
version of the object.  The deserializing procedure will then similarly
check to see if its previously deserialized and if so, expect a
follow-up object and then just operator= the original object (a pointer
to which I will have thanks to deserialization's tracking) with the
follow-up object.

Or, pseudocode would be:

Serialize:

Object *myobj = ...;
/* ... */
ar & myobj;
if (myobj->previously_serialized())
  ar & (NonTrackingObject *) myobj;

Deserialize:

Object *myobj;
ar & myobj;
if (myobj->previously_deserialized())
   ar & *myobj;

If my understanding is correct, if a tracking object has previously
been serialized (by pointer), any further attempt to deserialize a
pointer to that object will merely set the pointer to the previous
deserialized version. Thus when a previously deserialization has
happened, the first ar & myobj will only set myobj's pointer, and the
second, because the object in this case is non-tracking, it will have
the actual data I need, and since I deserialize to the deferenced
pointer, it will deserialize into that object.

Of course, any other place I deserialize the same pointer, I would not
do this check, since I want only a reference.

--
PreZ :)

Death is life's way of telling you you've been fired.
                -- R. Geis

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Preston A. Elder-2
I also have another follow-up question, regarding the same thing.

I'm pretty sure this is possible by reading the docs, but its not a
documented feature (I'm not surprised, really).

How do I do the equivalent of a la.reset_object_address(v, u) for an
object that has NOT been serialized with that archiver?

The situation is this, I will obviously have to have an iarchive and
oarchive class - to accomodate failing over between instances of my
application (remember, my [io]archive instances will remain active
throughout the application).  Therefore, when I send something to an
oarchive (ie. the primary replicating 'out'), I also need to add that
reference to the pointer tree maintained by iarchive.  Similarly, when
I receive the item via. an iarchive (ie. the secondary replicating 'in')
I need to add that reference to the pointer tree maintained by oarchive.

If I could make an iarchive and oarchive use a common pointer reference
tree, that would be ideal (I'm not very worried about thread safety
since only one would use it at a time), however if not, I need to keep
the complementing object references up to date in case of fail over.

Why?  Because if my primary goes down, and I fail to my secondary, it
now becomes the 'master' and starts replicating out itself.  Because of
this, it needs to be able to pick up where the old primary left off
(both in case of any tertiary instance listening, and because if the
primary comes back up it will want to be replicated to again, becoming
the secondary).

The more I think about this, the more it seems I'm trying to hammer a
round peg into a square hole - especially considering I know that each
tracked object is assigned an ID, and that ID would have to be
maintained between the iarchive and oarchive instances to be able to
change from a consumer into a producer like that and have third party
consumers not notice the difference.

Any ideas, etc. would be welcome.  Also, might I make a request to the
maintainer of serialization to hopefully turn boost::serialization into
something that can be more suitable for replication purposes in a
future release of boost?  The interface of boost::serialization is
fantastic, but the implementation I think needs a few more knobs and
switches to enable a wider variety of purpose.

--
PreZ :)

Death is life's way of telling you you've been fired.
                -- R. Geis

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Robert Ramey
Preston A. Elder wrote:

> Any ideas, etc. would be welcome.

I'm not exactly sure what you're trying to do - but  no matter
here's my idea anyway.

create a TEE type streambuf.  This would model the std::streambuf
that the standard library uses.  It would most likely be built with
the i/o streams library.  All date written to the streambuf would infact
be written to multiple stream implementations.  This would get you
replication for free.  In fact, what would be more useful would be
an i/o stream adapter which would take any number of streambufs
and compose them into one TEE type streambuf.  This would permit
one to leverage on all the streambufs already created.  It would mean
that the the streambufs would all have to be the same type.  Some
could be binary, others could be file bases, others could be network
connections, etc.  This is something that could/should be added to
the i/ostreams library - if it isn't already there.

The counter part of this - reading back one of the archives in  the same
application would read one of the streams in the TEE.  Remember that
all information concerning the state of the archive, addresses of created
pointers etc, class i/d, etc is local to the archive.  So there would be
no conflict.

> Also, might I make a request to the
> maintainer of serialization to hopefully turn boost::serialization
> into something that can be more suitable for replication purposes in a
> future release of boost?

So I don't see serialization as the right place to implement such
functionality.

>The interface of boost::serialization is
> fantastic, but the implementation I think needs a few more knobs and
> switches to enable a wider variety of purpose.

LOL - The reason the interface is "fantastic" is mainly due to my single
minded
dedication to keeping it that way.  The way I've done this is to keep
everything
out of it that can possible be put somewhere else.  I realize that this
sometimes
might seem limiting - but in fact its liberating.  It has kept serialization
from
turning into he C++ equivalent of Microsoft word - where it would do
everything everyone wanted if anyone could ever figure out how to make it
do what it is they want.  In spite of this, the serialization library
implementation
is still quite complicated.

I have toyed with experiments to make the serialization library more useful
for things like logging, rollback and recovery.  But the experiments have
been unsucessful so far in that the end up either making the library harder
to use or less efficient.  If I had nothing else to do, (or someone was
paying me to do this) I might spend more time at it.  But for the near term
I don't see any functionality being added to the serialization library.  I
spend
the time I have on incremental efficiency improvements and keeping it
buildable in a changing infrastructure (bjam v2, new test library, new
compilers - borland) etc.

I'm pleased you seem to like the library and have found it useful.

Robert Ramey



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Preston A. Elder-2
Sorry if you see this twice, but I don't think the original reply was sent.

Robert Ramey <[hidden email]> wrote:

> create a TEE type streambuf.  This would model the std::streambuf
> that the standard library uses.  It would most likely be built with
> the i/o streams library.  All date written to the streambuf would infact
> be written to multiple stream implementations.  This would get you
> replication for free.  In fact, what would be more useful would be
> an i/o stream adapter which would take any number of streambufs
> and compose them into one TEE type streambuf.  This would permit
> one to leverage on all the streambufs already created.  It would mean
> that the the streambufs would all have to be the same type.  Some
> could be binary, others could be file bases, others could be network
> connections, etc.  This is something that could/should be added to
> the i/ostreams library - if it isn't already there.
>
> The counter part of this - reading back one of the archives in  the same
> application would read one of the streams in the TEE.  Remember that
> all information concerning the state of the archive, addresses of created
> pointers etc, class i/d, etc is local to the archive.  So there would be
> no conflict.
If I used this method I would end up with objects being duplicated!

If I had a TEE style object and had:
 - 1 endpoint going to a local input stream
 - 1 endpoint going to disk
 - 1 endpoint going to a remote system (via. X transport method)

I would end up with multiple objects because of the first endpoint!
Every time the first endpoint saw a new object, it would allocate that
object and deserialize it, just like the remote one would (and should)
do.  This would mean every object would be there twice!

If, however, I could share the tracking map (eg. create a tracking map,
then pass it to the constructor of both the input and output
serializer, or alternatively, set it later or whatever), then this
would not be an issue.

>> Also, might I make a request to the
>> maintainer of serialization to hopefully turn boost::serialization
>> into something that can be more suitable for replication purposes in a
>> future release of boost?
> So I don't see serialization as the right place to implement such
> functionality.
Perhaps you're correct, perhaps CORBA is more appropriate.

However AFAIK, CORBA doesn't work so well when loading from a disk.
Plus CORBA doesn't solve one of my requirements.  My requirements are
simple:
 - Be able to restore the application to the same state it was when it
 died from a file on disk (persistence).
 - Be able to keep another instance of the application up to date
 real-time and be able to fail over to that system if necessary.
 - Be able to fail BACK to the original instance if necessary (eg. it
 is re-started and ready to once again be 'primary').

Serialization can handle all of these for me with some trickery to make
it handle 'updating' objects instead of just creating them (previously
mentioned in this thread).  However there is one thing that is
dangerous, and that is the fact that each instance of the application
will have to have an iarchive and an oarchive at all times.  And they
will need to have their object tracking in sync.

> LOL - The reason the interface is "fantastic" is mainly due to my single
> minded
> dedication to keeping it that way.  The way I've done this is to keep
> everything
> out of it that can possible be put somewhere else.  I realize that this
> sometimes
> might seem limiting - but in fact its liberating.  It has kept serialization
> from
> turning into he C++ equivalent of Microsoft word - where it would do
> everything everyone wanted if anyone could ever figure out how to make it
> do what it is they want.  In spite of this, the serialization library
> implementation
> is still quite complicated.
I know, I've looked at the code:)

However I'm not actually asking you to change much - just have the
ability to have two archives have the same object tracking backend.

A simple ability to just do CreateObjectTracker() and then pass the
result to the constructor of any archive I create after that would be
sufficient, even if the return value is completely opaque.  And of
course, if implemented as an optional argument to a constructor, the
default action could be to call that same function anyway.

I could still use boost's serialization without this functionality,
however I lose the biggest advantage (and strength) of serialization.
Namely the restoration of pointers automatically.

In other words, I could easily enough just create a new archive
instance for each time I want to serialize an object (or simply turn
off tracking for all objects), however this would mean all that book
keeping serialization does for me with previously seen objects and
restoring pointers and such would now have to be done by me, and more
importantly, done manually - increasing the possibility of missing
something.

The serialization library is VERY close to the functionality required
for replication (which is more or less a specialized form of
serialization anyway), it just has a few specific requirements that I
don't believe would change the way serialization works or complicate it
much more than it is now.

> I'm pleased you seem to like the library and have found it useful.
I just like the interface, its very clean.

--
PreZ :)

Death is life's way of telling you you've been fired.
                -- R. Geis

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Using serialization for replication

Robert Ramey
Preston A. Elder wrote:
> Sorry if you see this twice, but I don't think the original reply was
> sent.

> If I used this method I would end up with objects being duplicated!
>
> If I had a TEE style object and had:
> - 1 endpoint going to a local input stream
> - 1 endpoint going to disk
> - 1 endpoint going to a remote system (via. X transport method)
>
> I would end up with multiple objects because of the first endpoint!
> Every time the first endpoint saw a new object, it would allocate that
> object and deserialize it, just like the remote one would (and should)
> do.  This would mean every object would be there twice!

> If, however, I could share the tracking map (eg. create a tracking
> map, then pass it to the constructor of both the input and output
> serializer, or alternatively, set it later or whatever), then this
> would not be an issue.

> The serialization library is VERY close to the functionality required
> for replication (which is more or less a specialized form of
> serialization anyway), it just has a few specific requirements that I
> don't believe would change the way serialization works or complicate
> it much more than it is now.

Perhaps you might want to experiment with this idea by fiddling
with the serialization source.  Note that the "tracking map"
is part of the implementation of basic_iarchive.  This is not
exposed as you would like.  But there's no reason you can't
tweak the source to make it visible.  Then you could implement
what it seems you want.  Maybe that's a good solution for you.

Note that lots can be done by deriving from the existing archives
or make making a "Archive Adaptor" in a vein similar to
the polymorphic_iarchive.

Robert Ramey



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost