14. Log Shipping - Slony-I with Files

One of the new features for 1.1 is the ability to serialize the updates to go out into log files that can be kept in a spool directory.

The spool files could then be transferred via whatever means was desired to a "slave system," whether that be via FTP, rsync, or perhaps even by pushing them onto a 1GB "USB key" to be sent to the destination by clipping it to the ankle of some sort of "avian transport" system.

There are plenty of neat things you can do with a data stream in this form, including:

14.1. Where are the "spool files" for a subscription set generated?
14.2. What takes place when a FAILOVER/ MOVE SET takes place?
14.3. What if we run out of "spool space"?
14.4. How do we set up a subscription?
14.5. What are the limitations of log shipping?

14.1. Where are the "spool files" for a subscription set generated?

Any slon subscriber node can generate them by adding the -a option.

Note: Notice that this implies that in order to use log shipping, you must have at least one subscriber node.

14.2. What takes place when a FAILOVER/ MOVE SET takes place?

Nothing special. So long as the archiving node remains a subscriber, it will continue to generate logs.

14.3. What if we run out of "spool space"?

The node will stop accepting SYNCs until this problem is alleviated. The database being subscribed to will also fall behind.

14.4. How do we set up a subscription?

The script in tools called slony1_dump.sh is a shell script that dumps the "present" state of the subscriber node.

You need to start the slon for the subscriber node with logging turned on. At any point after that, you can run slony1_dump.sh, which will pull the state of that subscriber as of some SYNC event. Once the dump completes, all the SYNC logs generated from the time that dump started may be added to the dump in order to get a "log shipping subscriber."

14.5. What are the limitations of log shipping?

In the initial release, there are rather a lot of limitations. As releases progress, hopefully some of these limitations may be alleviated/eliminated.

The log shipping functionality amounts to "sniffing" the data applied at a particular subscriber node. As a result, you must have at least one "regular" node; you cannot have a cluster that consists solely of an origin and a set of "log shipping nodes.".

The "log shipping node" tracks the entirety of the traffic going to a subscriber. You cannot separate things out if there are multiple replication sets.

The "log shipping node" presently only fully tracks SYNC events. This should be sufficient to cope with some changes in cluster configuration, but not others.

A number of event types are handled in such a way that log shipping copes with them:

  • SYNC events are, of course, handled.

  • DDL_SCRIPT is handled.

  • UNSUBSCRIBE_SET

    This event, much like SUBSCRIBE_SET is not handled by the log shipping code. But its effect is, namely that SYNC events on the subscriber node will no longer contain updates to the set.

    Similarly, SET_DROP_TABLE, SET_DROP_SEQUENCE, SET_MOVE_TABLE, SET_MOVE_SEQUENCE DROP_SET, MERGE_SET, will be handled "appropriately".

  • SUBSCRIBE_SET

    Unfortunately, there is some "strangeness" in the handling of this... When SUBSCRIBE_SET occurs, it leads to an event being raised, and processed, purely on the subscriber, called ENABLE_SUBSCRIPTION.

    SUBSCRIBE_SET is really quite a simple event; it merely declares that a node is subscribing to a particular set via a particular provider. It doesn't copy data!

    The meat of the subscription work is done by ENABLE_SUBSCRIPTION, which is an event that is raised on the local node, not in the same sequence as the other events coming from other nodes (notably the data provider).

    Unfortunately, the upshot of this is that when a node newly subscribes to a set, the log that actually contains the data is in a separate sequencing from the sequencing of the normal SYNC logs. Blindly loading these logs will throw things off :-(.

  • The various events involved in node configuration are irrelevant to log shipping: STORE_NODE, ENABLE_NODE, DROP_NODE, STORE_PATH, DROP_PATH, STORE_LISTEN, DROP_LISTEN

  • Events involved in describing how particular sets are to be initially configured are similarly irrelevant: STORE_SET, SET_ADD_TABLE, SET_ADD_SEQUENCE, STORE_TRIGGER, DROP_TRIGGER, TABLE_ADD_KEY

It would be nice to be able to turn a "log shipped" node into a fully communicating Slony-I node that you could failover to. This would be quite useful if you were trying to construct a cluster of (say) 6 nodes; you could start by creating one subscriber, and then use log shipping to populate the other 4 in parallel.

This usage is not supported, but presumably one could add the Slony-I configuration to the node, and promote it into being a new node. Again, a Simple Matter Of Programming (that might not necessarily be all that simple)...

14.1. Usage Hints

Note: Here are some more-or-less disorganized notes about how you might want to use log shipping...