Slony-I Introduction


1. Introduction to Slony-I

1.1. What Slony-I is

Slony-I is a "master to multiple slaves" replication system supporting cascading and slave promotion. The big picture for the development of Slony-I is as a master-slave system that includes the sorts of capabilities needed to replicate large databases to a reasonably limited number of slave systems. "Reasonable," in this context, is on the order of a dozen servers. If the number of servers grows beyond that, the cost of communications increases prohibitively, and the incremental benefits of having multiple servers will be falling off at that point.

See also Section 2 for a further analysis of costs associated with having many nodes.

Slony-I is a system intended for data centers and backup sites, where the normal mode of operation is that all nodes are available all the time, and where all nodes can be secured. If you have nodes that are likely to regularly drop onto and off of the network, or have nodes that cannot be kept secure, Slony-I is probably not the ideal replication solution for you.

Thus, examples of cases where Slony-I probably won't work out well would include:

  • Sites where connectivity is really "flakey"

  • Replication to nodes that are unpredictably connected.

  • Replicating a pricing database from a central server to sales staff who connect periodically to grab updates.

  • Sites where configuration changes are made in a haphazard way.

  • A "web hosting" situation where customers can independently make arbitrary changes to database schemas is not a good candidate for Slony-I usage.

There is also a file-based log shipping extension where updates would be serialized into files. Given that, log files could be distributed by any means desired without any need of feedback between the provider node and those nodes subscribing via "log shipping." "Log shipped" nodes do not add to the costs of communicating events between Slony-I nodes.

But Slony-I, by only having a single origin for each set, is quite unsuitable for really asynchronous multiway replication. For those that could use some sort of "asynchronous multimaster replication with conflict resolution" akin to what is provided by Lotus Notes or the "syncing" protocols found on PalmOS systems, you will really need to look elsewhere.

These other sorts of replication models are not without merit, but they represent different replication scenarios that Slony-I does not attempt to address.

1.2. Why yet another replication system?

Slony-I was born from an idea to create a replication system that was not tied to a specific version of PostgreSQL, which is allowed to be started and stopped on an existing database without the need for a dump/reload cycle.

1.3. What Slony-I is not

  • Slony-I is not a network management system.

  • Slony-I does not have any functionality within it to detect a node failure, nor to automatically promote a node to a master or other data origin.

    It is quite possible that you may need to do that; that will require that you combine some network tools that evaluate to your satisfaction which nodes you consider "live" and which nodes you consider "dead" along with some local policy to determine what to do under those circumstances. Slony-I does not dictate any of that policy to you.

  • Slony-I is not a multi-master replication system; it is not a connection broker, and it won't make you coffee and toast in the morning.

All that being said, there are tools available to help with some of these things, and there is a plan under way for a subsequent system, Slony-II, to provide "multimaster" capabilities. But that represents a different, separate project, being implemented in a rather different fashion than Slony-I, and expectations for Slony-I should not be based on hopes for future projects.

1.4. Why doesn't Slony-I do automatic fail-over/promotion?

Determining whether a node has "failed" is properly the responsibility of network management software, not Slony-I. The configuration, fail-over paths, and preferred policies will be different for each site. For example, keep-alive monitoring with redundant NIC's and intelligent HA switches that guarantee race-condition-free takeover of a network address and disconnecting the "failed" node will vary based on network configuration, vendor choices, and the combinations of hardware and software in use. This is clearly in the realm of network management and not Slony-I.

Furthermore, choosing what to do based on the "shape" of the cluster represents local business policy, particularly in view of the fact that FAIL OVER requires discarding the failed node. If Slony-I imposed failover policy on you, that might conflict with business requirements, thereby making Slony-I an unacceptable choice.

As a result, let Slony-I do what it does best: provide database replication services.

1.5. Current Limitations

Slony-I does not automatically propagate schema changes, nor does it have any ability to replicate large objects. There is a single common reason for these limitations, namely that Slony-I collects updates using triggers, and neither schema changes, large object operations, nor TRUNCATE requests are able to have triggers suitable to inform Slony-I when those sorts of changes take place. As a result, the only database objects where Slony-I can replicate updates are tables and sequences.

Note that with the use of triggers comes some additional fiddling around with triggers. On the "origin" for each replicated table, an additional trigger is added which runs the stored procedure schemadoc.logtrigger( ). On each subscriber, tables are augmented with a trigger that runs the schemadocdenyaccess( ) function; this function prevents anything other than the slon process from updating data in replicated tables. In addition, any other triggers and rules on replicated tables are suppressed on the subscribers: This is done by pointing them, in the system table, to the primary key index instead of to the table itself. This represents something of a "corruption" of the data dictionary, and is why you should not directly use pg_dump to dump schemas on subscribers.

There is a capability for Slony-I to propagate other kinds of database modifications, notably DDL changes, if you submit them as scripts via the slonik SLONIK EXECUTE SCRIPT operation. That is not handled "automatically;" you, as a database administrator, will have to construct an SQL DDL script and submit it, via SLONIK EXECUTE SCRIPT and there are a number of further caveats.

If you have those sorts of requirements, it may be worth exploring the use of PostgreSQL 8.X PITR (Point In Time Recovery), where WAL logs are replicated to remote nodes. Unfortunately, that has two attendant limitations:

  • PITR replicates all changes in all databases; you cannot exclude data that isn't relevant;

  • A PITR replica remains dormant until you apply logs and start up the database. You cannot use the database and apply updates simultaneously. It is like having a "standby server" which cannot be used without it ceasing to be "standby."

1.6. Replication Models

There are a number of distinct models for database replication; it is impossible for one replication system to be all things to all people.

Slony-I implements a particular model, namely that of asynchronous replication, using triggers to collect table updates, where a single "origin" may be replicated to multiple "subscribers" including cascaded subscribers.

There are a number of other replication models which are different ; it is worth pointing out other approaches that exist. Slony-I is certainly not the only approach, and for some applications, it is not the optimal approach.

  • Synchronous single-origin multi-subscriber replication

    In a synchronous system, updates cannot be committed at the origin until they have also been accepted by subscriber nodes. This enhances the security property of nonrepudiation as updates will not be committed until they can be confirmed elsewhere. Unfortunately, the requirement that changes be applied in multiple places introduces a performance bottleneck.

    This approach is similar to the two phase commit processing model of the XA transaction processing protocol.

  • Synchronous multi-origin multi-subscriber replication

    This is the model being used by the possibly-forthcoming Slony-II system. Synchronous replication systems all "suffer" from the performance bottleneck that updates must be accepted on all nodes before they can be committed anywhere.

    That generally makes it impractical to run synchronous replication across wide area networks.

  • Asynchronous multimaster replication with conflict avoidance/resolution

    Perhaps the most widely used replication system of this sort is the PalmOS HotSync system. Lotus Notes™ also provides a replication system that functions in much this manner.

    The characteristic "troublesome problem" with this style of replication is that it is possible for conflicts to arise because users update the same record in different ways on different nodes.

    In the case of HotSync, if conflicts arise due to records being updated on multiple nodes, the "resolution" is to simply create a duplicate record to reflect the two changes, and have the user resolve the conflict manually.

    Some async multimaster systems try to resolve conflicts by finding ways to apply partial record updates. For instance, with an address update, one user, on one node, might update the phone number for an address, and another user might update the street address, and the conflict resolution system might try to apply these updates in a non-conflicting order. This can also be considered a form of "table partitioning" where a database table is treated as consisting of several "sub-tables."

    Conflict resolution systems almost always require some domain knowledge of the application being used, which means that they can only deal automatically with those conflicts where you have assigned a policy. If they run into conflicts for which no policy is available, replication stops until someone applies some manual intervention.