Slony-I is a "master to multiple slaves" replication system supporting cascading (e.g. - a node can feed another node which feeds another node...) and failover.

The big picture for the development of Slony-I is that it is a master-slave replication system that includes all features and capabilities needed to replicate large databases to a reasonably limited number of slave systems.

Slony-I is a system designed for use at data centers and backup sites, where the normal mode of operation is that all nodes are available.

A fairly extensive "admin guide" comprising material in the CVS tree may be found here. There is also a local copy.

The original design document is available here.

See the "news" area for more details, including a copy of the release notes.

This is a major new release of Slony-I; it makes use of some features introduced in PostgreSQL 8.3, and hence is not compatible with versions older than 8.3.

This is considered a pretty good tradeoff, as various functionality would not be possible with earlier versions of PostgreSQL.

  • Internal catalogues are no longer "hacked with," so that you may, with the new version, use pg_dump against subscribers and be able to expect to have a complete and consistent dump.
  • Trigger handling is enormously cleaner.

Unfortunately, it needs to be noted that there is not, as of yet, an upgrade procedure to upgrade an installation of Slony-I 1.2.x to 2.0. At present, upgrading to 2.0 will essentially require dropping out replication and recreating it.

We hope to have an answer as to how to deal with this in the new year.

Follow Bug #69 for more details on this.

Version 1.2.15 is now available. See the "news" area for more details, including a copy of the release notes.

People frequently ask for assistance in figuring out what might be wrong with their cluster. The first thing that you should do if you think there might be a problem (or even if you don't) is to run the test state scripts. That may help point you to where the problem is; it may also help point other would-be helpers to where the problem is.

If you're not running these scripts hourly against your cluster(s), you really should be...

Version 2.0.0 RC1 is now available. See the "news" area for more details, including a copy of the release notes.

Slony-1 2.0.0 engine documentation
Slony-1 1.2.15 engine documentation
Slony-1 1.1.9 engine
Chris Browne 2008-09-12

Unfortunately, it needs to be noted that there is not, as of yet, an upgrade procedure to upgrade an installation of Slony-I 1.2.x to 2.0. At present, upgrading to 2.0 will essentially require dropping out replication and recreating it.

We hope to have an answer as to how to deal with this in the new year.

Chris Browne 2008-12-19
Differences from 1.2 stream
  • Removal of TABLE ADD KEY
  • It drops all support for databases prior to Postgres version 8.3.

    This is required because we now make use of new functionality in Postgres, namely the trigger and rule support for session replication role. As of now, every node (origin/subscriber/mixed) can be dumped with pg_dump and result in a consistent snapshot of the database.

  • Still need alterTableRestore() for the upgrade from 1.2.x to 2.0. upgradeSchema() will restore the system catalog to a consistent state and define+configure the new versions of the log and deny_access triggers.
  • Fix EXECUTE SCRIPT so that it records the ev_seqno for WAIT FOR EVENT and make sure all DDL is executed in session_replication_role "local" on the origin as well as all subscribers. This will cause the slony triggers to ignore all DML statements while user triggers follow the regular configuration options for ENABLE [REPLICA/ALWAYS] or DISABLE.
  • Let the logshipping files also switch to session_replication_role = "replica" or "local" (for DDL).
  • Sequence tracking becomes enormously less expensive; rather than polling *ALL* sequence values for each and every SYNC, the slon stores the last value, and only records entries in sl_seqlog when the value changes from that last value. If most sequences are relatively inactive, they won't require entries in sl_seqlog very often.
  • Change to tools/slony1_dump.sh (used to generate log shipping dump); change quoting of "\\\backslashes\\\" to get rid of warning
  • Cleanup thread revised to push most of the logic to evaluate which tables are to be vacuumed into a pair of stored functions.

    This fairly massively simplifies the C code.

  • Revised logging levels so that most of the interesting messages are spit out at SLON_CONFIG and SLON_INFO levels. This can allow users to drop out the higher DEBUG levels and still have useful logs.
  • Changed log selection query to be less affected by long running transaction. This should help, in particular, the scenario where it takes a very long time to subscribe to a set. In that situation, we have had the problem where applying the later SYNCs gets extremely costly as the query selecting logs wound up forced into a Seq Scan rather than an index scan.
  • Removed all support for STORE/DROP TRIGGER commands. Users should use the ALTER TABLE [ENABLE|DISABLE] TRIGGER functionality available directly in Postgres from now on.
  • Improve Wiki page generation script so that it has an option to add in a set of [[Category:Foo]] tags to allow automated categorization.
  • Documented how to fix tables that presently use Slony-I-generated primary key candidates generated by TABLE ADD KEY
  • Add some specific timestamps during the 2007 "DST rule change ambiguous time" (e.g. - during the period which, under former rules, was not DST, but which now is, due to the recent rule change).

    Bill Moran ran into some problems with such dates; varying PostgreSQL versions returned somewhat varying results. This wasn't a Slony-I problem; the data was indeed being replicated correctly.

  • Made configure a bit smarter about automatically locating docbook2man-spec.pl on Debian, Fedora, BSD.
  • Tests now generate |pipe|delimited|output| indicating a number of attributes of each test, including system/platform information, versions, and whether or not the test succeeded or failed.
  • Revised functions that generate listen paths
  • tools/configure-replication.sh script permits specifying a destination path for generated config files. This enables using it within automated processes, and makes it possible to use it to generate Slonik scripts for tests in the "test bed," which has the further merit of making tools/configure-replication.sh a regularly-regression-tested tool.
  • Fix to bug #15 - where long cluster name (>40 chars) leads to things breaking when an index name is created that contains the cluster name.

    Warn upon creating a long cluster name.

    Give a useful exception that explains the cause rather than merely watching index creation fail.

    Bug 15

  • Fix for bug #19 - added a script to help the administrator search for any triggers on the database that is the source for a schema that is to be used to initialize a log shipping node.

    The problem is that some/most/sometimes all triggers and rules are likely to need to be dropped from the log shipping node lest they interfere with replication.

  • Elimination of custom "xxid" functions

    PostgreSQL 8.3 introduces a set of "txid" functions and a "txid_snapshot" type, which eliminates the need for Slony-I to have its own C functions for doing XID comparisons.

    Note that this affects the structure of sl_event, and leads to some changes in the coding of the regression tests.

    This eliminates the src/xxid directory and contents

  • All of the interesting cleanup work is now done in the stored function, cleanupEvent(interval, boolean).

    Interesting side-effect: You can now induce a cleanup manually, which will be useful for testing.

  • cleanupEvent now has two parameters, passed in from slon config parameters:

    interval - cleanup_interval (default '10 minutes')

    This controls how quickly old events are trimmed out. It used to be a hard-coded value.

    Old events are trimmed out once the confirmations are aged by (cleanup_interval).

    This then controls when the data in sl_log_1/sl_log_2 can be dropped.

    Data in *those* tables is deleted when it is older than the earliest XID still captured in sl_event.

    boolean - cleanup_deletelogs (default 'false')

    This controls whether or not we DELETE data from sl_log_1/sl_log_2

    By default, we now NEVER delete data from the log tables; we instead use TRUNCATE.

  • We now consider initiating a log switch every time cleanupEvent() runs.

    If the call to logswitch_finish() indicates that there was no log switch in progress, we initiate one.

    This means that log switches will be initiated almost as often as possible. That's a policy well worth debating :-).

  • logswitch_finish() changes a fair bit...

    It uses the same logic as in cleanupEvent() to determine if there are any *relevant* tuples left in sl_log_[whatever], rather than (potentially) scanning the table to see if there are any undeleted tuples left.

  • At slon startup time, it logs (at SLON_CONFIG level) all of the parameter values. Per Bugzilla entry #21.
  • New slonik "CLONE PREPARE" and "CLONE FINISH" command to assist in creating duplicate nodes based on taking a copy of some existing subscriber node.
  • We no longer use LISTEN/NOTIFY for events and confirmations, which eliminates the usage that has caused pg_listener bloat. We instead poll against the event table.
  • Various instances where slonik would use a default node ID of 1 have been changed to remove this.

    Slonik scripts may need to be changed to indicate an EVENT NODE (or similar) after migration to v2.0 as a result.

    The slonik commands involved:

    • STORE NODE - EVENT NODE
    • DROP NODE - EVENT NODE
    • WAIT FOR EVENT - WAIT ON
    • FAILOVER - BACKUP NODE
    • EXECUTE SCRIPT - EVENT NODE
  • Fixed a problem where ACCEPT_SET would wait for the corresponding MOVE_SET or FAILOVER_SET to arrive while holding an exclusive lock on sl_config_lock, preventing the other remote worker to process that event.
  • Bug #54 - quite a few Bash-isms in various scripts have been addressed so as to make the shell scripts more portable.
  • Bug #18 - the function parameter for the logtrigger functions no longer requires any trailing v's

    Add a test to "test1" to make sure this logic gets exercised.

  • Created "start_slon.sh", an rc.d-style script for starting, stopping, and checking status of slon processes.

    Integrated this into the regression tests, replacing previous logic for starting/stopping slons, so that this script can be considered carefully tested

  • Bug #46 - incompatibility with PostgreSQL 8.4 addressed
  • Use dollar quoting in stored functions
  • Additional logging of the time spent running queries, broken out on a by-database basis
  • Fixes to documentation of WAIT FOR EVENT
  • Fix to bug #63 - cleanup thread had an imperative SELECT that needed to become part of an IF statement
  • Enhancement - bug #61 - logshipper process should rescan the queue when it empties
  • Note about "duct tape" tests: There are many of these tests that reside in src/ducttape that reference features removed in v2.0.

    We will eventually be replacing these with a more proper "test suite" so we're not remedying all the ducttape tests.

Chris Browne 2008-11-24
Now available is a second release candidate of version 2.0 of Slony-I, fixing a number of issues found since the first release candidate.
Differences from 2.0.0 RC1
  • Bug #54 - fixed various bash-isms
  • Bug #18 - function parameter for logtrigger no longer requires trailing "v"'s
  • Created start_slon.sh, an rc.d-style script to start, stop, and check status of a slon process, integrating its usage into regression tests
  • Bug #46 - incompatibility with PostgreSQL 8.4
  • Added an extended example of upgrading from one version of PostgreSQL to another
  • Use dollar quoting in stored functions
  • Added logging code that lists time spent running queries against different nodes to help when analyzing performance issues
  • A little more fixing-up of logging levels
  • Test suite uses non-zero lag_interval
Chris Browne 2008-09-24
Version 1.2.15 has been released. Changes include:
  • Fix to STORE TRIGGER

    - store trigger was running against all nodes upon subscription

    Bug #56

  • Portability changes to some tools/ scripts, fixing some "bash-isms"
  • Fix to switch statement in slonik.c; unknown how it broke
  • Fix Bug #52 - memory leak
Chris Browne 2008-09-12
Now available is a first release candidate of version 2.0 of Slony-I, involving a large number of enhancements done over the last year or so.
Differences from 1.2 stream
  • Removal of TABLE ADD KEY
  • It drops all support for databases prior to Postgres version 8.3.

    This is required because we now make use of new functionality in Postgres, namely the trigger and rule support for session replication role. As of now, every node (origin/subscriber/mixed) can be dumped with pg_dump and result in a consistent snapshot of the database.

  • Still need alterTableRestore() for the upgrade from 1.2.x to 2.0. upgradeSchema() will restore the system catalog to a consistent state and define+configure the new versions of the log and deny_access triggers.
  • Fix EXECUTE SCRIPT so that it records the ev_seqno for WAIT FOR EVENT and make sure all DDL is executed in session_replication_role "local" on the origin as well as all subscribers. This will cause the slony triggers to ignore all DML statements while user triggers follow the regular configuration options for ENABLE [REPLICA/ALWAYS] or DISABLE.
  • Let the logshipping files also switch to session_replication_role = "replica" or "local" (for DDL).
  • Sequence tracking becomes enormously less expensive; rather than polling *ALL* sequence values for each and every SYNC, the slon stores the last value, and only records entries in sl_seqlog when the value changes from that last value. If most sequences are relatively inactive, they won't require entries in sl_seqlog very often.
  • Change to tools/slony1_dump.sh (used to generate log shipping dump); change quoting of "\\\backslashes\\\" to get rid of warning
  • Cleanup thread revised to push most of the logic to evaluate which tables are to be vacuumed into a pair of stored functions.

    This fairly massively simplifies the C code.

  • Revised logging levels so that most of the interesting messages are spit out at SLON_CONFIG and SLON_INFO levels. This can allow users to drop out the higher DEBUG levels and still have useful logs.
  • Changed log selection query to be less affected by long running transaction. This should help, in particular, the scenario where it takes a very long time to subscribe to a set. In that situation, we have had the problem where applying the later SYNCs gets extremely costly as the query selecting logs wound up forced into a Seq Scan rather than an index scan.
  • Removed all support for STORE/DROP TRIGGER commands. Users should use the ALTER TABLE [ENABLE|DISABLE] TRIGGER functionality available directly in Postgres from now on.
  • Improve Wiki page generation script so that it has an option to add in a set of [[Category:Foo]] tags to allow automated categorization.
  • Documented how to fix tables that presently use Slony-I-generated primary key candidates generated by TABLE ADD KEY
  • Add some specific timestamps during the 2007 "DST rule change ambiguous time" (e.g. - during the period which, under former rules, was not DST, but which now is, due to the recent rule change).

    Bill Moran ran into some problems with such dates; varying PostgreSQL versions returned somewhat varying results. This wasn't a Slony-I problem; the data was indeed being replicated correctly.

  • Made configure a bit smarter about automatically locating docbook2man-spec.pl on Debian, Fedora, BSD.
  • Tests now generate |pipe|delimited|output| indicating a number of attributes of each test, including system/platform information, versions, and whether or not the test succeeded or failed.
  • Revised functions that generate listen paths
  • tools/configure-replication.sh script permits specifying a destination path for generated config files. This enables using it within automated processes, and makes it possible to use it to generate Slonik scripts for tests in the "test bed," which has the further merit of making tools/configure-replication.sh a regularly-regression-tested tool.
  • Fix to bug #15 - where long cluster name (>40 chars) leads to things breaking when an index name is created that contains the cluster name.
    • Warn upon creating a long cluster name.
    • Give a useful exception that explains the cause rather than merely watching index creation fail.
    See Bug #15
  • Fix for bug #19 - added a script to help the administrator search for any triggers on the database that is the source for a schema that is to be used to initialize a log shipping node.

    The problem is that some/most/sometimes all triggers and rules are likely to need to be dropped from the log shipping node lest they interfere with replication.

  • Elimination of custom "xxid" functions

    PostgreSQL 8.3 introduces a set of "txid" functions and a "txid_snapshot" type, which eliminates the need for Slony-I to have its own C functions for doing XID comparisons.

    Note that this affects the structure of sl_event, and leads to some changes in the coding of the regression tests.

    This eliminates the src/xxid directory and contents

  • All of the interesting cleanup work is now done in the stored function, cleanupEvent(interval, boolean).

    Interesting side-effect: You can now induce a cleanup manually, which will be useful for testing.

  • cleanupEvent now has two parameters, passed in from slon config parameters:

    interval - cleanup_interval (default '10 minutes')

    This controls how quickly old events are trimmed out. It used to be a hard-coded value.

    Old events are trimmed out once the confirmations are aged by (cleanup_interval).

    This then controls when the data in sl_log_1/sl_log_2 can be dropped.

    Data in those tables is deleted when it is older than the earliest XID still captured in sl_event.

    boolean - cleanup_deletelogs (default 'false')

    This controls whether or not we DELETE data from sl_log_1/sl_log_2

    By default, we now NEVER delete data from the log tables; we instead use TRUNCATE.

  • We now consider initiating a log switch every time cleanupEvent() runs.

    If the call to logswitch_finish() indicates that there was no log switch in progress, we initiate one.

    This means that log switches will be initiated almost as often as possible. That's a policy well worth debating :-).

  • logswitch_finish() changes a fair bit...

    It uses the same logic as in cleanupEvent() to determine if there are any *relevant* tuples left in sl_log_[whatever], rather than (potentially) scanning the table to see if there are any undeleted tuples left.

  • At slon startup time, it logs (at SLON_CONFIG level) all of the parameter values. Per Bugzilla entry #21.
  • New slonik "CLONE PREPARE" and "CLONE FINISH" command to assist in creating duplicate nodes based on taking a copy of some existing subscriber node.
  • We no longer use LISTEN/NOTIFY for events and confirmations, which eliminates the usage that has caused pg_listener bloat. We instead poll against the event table.
  • Various instances where slonik would use a default node ID of 1 have been changed to remove this.

    Slonik scripts may need to be changed to indicate an EVENT NODE (or similar) after migration to v2.0 as a result.

    The slonik commands involved:

    • STORE NODE - EVENT NODE
    • DROP NODE - EVENT NODE
    • WAIT FOR EVENT - WAIT ON
    • FAILOVER - BACKUP NODE
    • EXECUTE SCRIPT - EVENT NODE
  • Fixed a problem where ACCEPT_SET would wait for the corresponding MOVE_SET or FAILOVER_SET to arrive while holding an exclusive lock on sl_config_lock, preventing the other remote worker to process that event.
Chris Browne 2008-06-27
Version 1.2.14 has been released. Changes include:
  • Fix typo in configure-replication.sh (missing CR)
  • Per bug #35, search the Slony share dir for scripts before falling back to the PG share dir on 8.0+

    This has resulted in quite a lot of discussion on bug #35; we need agreement that this change is an apropos way to go...

  • Change test framework to write out the test name into $TEMPDIR/TestName
  • Patch that seems to resolve a race condition with ACCEPT_SET
  • Fix bug #49 - mishandling by slony_logshipper of quotes and backslashes.
  • Fix bug #50 - slony_logshipper had a variable access after memory was freed
Chris Browne 2008-05-16
Vivek Khera observed that the documentation tarball did not include .png files or man pages; added this in, and rebuilt the tarball.

There is no change to the source code tarball.

Chris Browne 2008-02-29
  • Fixed problem with compatibility with PostgreSQL 8.3; function typenameTypeId() has 3 arguments as of 8.3.

    This now allows Slony-I to work with the PostgreSQL 8.3.0 release.

  • Added in logic to ensure that max # of SYNCs grouped together is actually constrained by config parameter sync_group_maxsize.
  • Fix to show_slony_configuration - point to proper directory where slon/slonik are actually installed.
  • Fix to slonik Makefile + slonik.c - Change slonik build to query Postgres for the share directory at runtime - per Dave Page
  • Removed spurious NOTIFY on "_%s_Confirm"; this is no longer needed in the 1.2 branch, as there is no LISTEN on this notification. Noted in bug #32
Chris Browne 2008-02-09

Many thanks to Command Prompt staff for installing and configuring a Bugzilla instance for use in tracking Slony-I bugs and feature requests.

Feature and bug lists will be migrating into this over the next little while...

Christopher Browne 2007-11-13

Version 1.2.12 is now released

  • Fixed problem with DDL SCRIPT parser where C-style comments were not being processed properly
  • Added stored functions and documentation for adding empty tables (notably *partitions*) to replication. Note these functions do no work when not specifically requested. CAVEAT: This functionality may not work as expected on versions of PostgreSQL earlier than 8.1. Mind you, partitioning tends to function pretty poorly in earlier versions of PostgreSQL as there were substantial enhancements in 8.1 and following versions.
  • Added a fairly substantial partitioning test to exercise the new stored functions above.
  • Backport "listen path" generator function from CVS HEAD (2.0) to 1.2 branch.
  • Fixed a problem with "EXECUTE SCRIPT" (introduced in remote_worker.c version 1.124.2.13) where moving the relevant code into a subroutine at the end led to losing the "BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;" query that needs to be the first thing run...
  • Fixing the archive sequence generations (in log shipping). All non-SYNC events must start the local transaction before creating the archive as well, so that the lock on the archive counter table serializes archive creation.
  • Fixed logging done in local_listener.c - various places, there was no '\n' in some cases, which would lead to entries being folded together.
  • Fix launch_slons.sh - was not stripping quotes from PID file name
  • Error handling for "ERROR: could not serialize access due to concurrent update"

    If this error is encountered when starting processing of sl_archive_counter, then two threads are fighting over access to this counter, and at least one has just failed.

    Rather than waiting, we ask to restart the node immediately.

  • Fixes to slonik_build_env script - it wasn't properly handling cases where there was just 1 table or 1 sequence, and had a problem with the -schema option - thanks, Bernd Helmle
  • Don't bother building slony_logshipper on Win32 as it doesn't work there at this point.
  • If slonik connects as other than a superuser, then generate error message indicating this to the user.
Christopher Browne 2007-11-12