Version 2.0.0 RC2 is now available. See the "news" area for more details, including a copy of the release notes.
Slony-I is a "master to multiple slaves" replication system supporting cascading (e.g. - a node can feed another node which feeds another node...) and failover.
The big picture for the development of Slony-I is that it is a master-slave replication system that includes all features and capabilities needed to replicate large databases to a reasonably limited number of slave systems.
Slony-I is a system designed for use at data centers and backup sites, where the normal mode of operation is that all nodes are available.
A fairly extensive "admin guide" comprising material in the CVS tree may be found here. There is also a local copy.
The original design document is available here.
Version 1.2.15 is now available. See the "news" area for more details, including a copy of the release notes.
People frequently ask for assistance in figuring out what might be wrong with their cluster. The first thing that you should do if you think there might be a problem (or even if you don't) is to run the test state scripts. That may help point you to where the problem is; it may also help point other would-be helpers to where the problem is.
If you're not running these scripts hourly against your cluster(s), you really should be...
Version 2.0.0 RC1 is now available. See the "news" area for more details, including a copy of the release notes.
Differences from 2.0.0 RC1
- Bug #54 - fixed various bash-isms
- Bug #18 - function parameter for logtrigger no longer requires trailing "v"'s
- Created start_slon.sh, an rc.d-style script to start, stop, and check status of a slon process, integrating its usage into regression tests
- Bug #46 - incompatibility with PostgreSQL 8.4
- Added an extended example of upgrading from one version of PostgreSQL to another
- Use dollar quoting in stored functions
- Added logging code that lists time spent running queries against different nodes to help when analyzing performance issues
- A little more fixing-up of logging levels
- Test suite uses non-zero lag_interval
- Fix to STORE TRIGGER
- store trigger was running against all nodes upon subscription
- Portability changes to some tools/ scripts, fixing some "bash-isms"
- Fix to switch statement in slonik.c; unknown how it broke
- Fix Bug #52 - memory leak
Differences from 1.2 stream
- Removal of TABLE ADD KEY
- It drops all support for databases prior to Postgres version 8.3.
This is required because we now make use of new functionality in Postgres, namely the trigger and rule support for session replication role. As of now, every node (origin/subscriber/mixed) can be dumped with pg_dump and result in a consistent snapshot of the database.
- Still need alterTableRestore() for the upgrade from 1.2.x to 2.0. upgradeSchema() will restore the system catalog to a consistent state and define+configure the new versions of the log and deny_access triggers.
- Fix EXECUTE SCRIPT so that it records the ev_seqno for WAIT FOR EVENT and make sure all DDL is executed in session_replication_role "local" on the origin as well as all subscribers. This will cause the slony triggers to ignore all DML statements while user triggers follow the regular configuration options for ENABLE [REPLICA/ALWAYS] or DISABLE.
- Let the logshipping files also switch to session_replication_role = "replica" or "local" (for DDL).
- Sequence tracking becomes enormously less expensive; rather than polling *ALL* sequence values for each and every SYNC, the slon stores the last value, and only records entries in sl_seqlog when the value changes from that last value. If most sequences are relatively inactive, they won't require entries in sl_seqlog very often.
- Change to tools/slony1_dump.sh (used to generate log shipping dump); change quoting of "\\\backslashes\\\" to get rid of warning
- Cleanup thread revised to push most of the logic to evaluate which
tables are to be vacuumed into a pair of stored functions.
This fairly massively simplifies the C code.
- Revised logging levels so that most of the interesting messages are spit out at SLON_CONFIG and SLON_INFO levels. This can allow users to drop out the higher DEBUG levels and still have useful logs.
- Changed log selection query to be less affected by long running transaction. This should help, in particular, the scenario where it takes a very long time to subscribe to a set. In that situation, we have had the problem where applying the later SYNCs gets extremely costly as the query selecting logs wound up forced into a Seq Scan rather than an index scan.
- Removed all support for STORE/DROP TRIGGER commands. Users should use the ALTER TABLE [ENABLE|DISABLE] TRIGGER functionality available directly in Postgres from now on.
- Improve Wiki page generation script so that it has an option to add in a set of [[Category:Foo]] tags to allow automated categorization.
- Documented how to fix tables that presently use Slony-I-generated primary key candidates generated by TABLE ADD KEY
- Add some specific timestamps during the 2007 "DST rule change
ambiguous time" (e.g. - during the period which, under former rules,
was not DST, but which now is, due to the recent rule change).
Bill Moran ran into some problems with such dates; varying PostgreSQL versions returned somewhat varying results. This wasn't a Slony-I problem; the data was indeed being replicated correctly.
- Made configure a bit smarter about automatically locating docbook2man-spec.pl on Debian, Fedora, BSD.
- Tests now generate |pipe|delimited|output| indicating a number of attributes of each test, including system/platform information, versions, and whether or not the test succeeded or failed.
- Revised functions that generate listen paths
- tools/configure-replication.sh script permits specifying a destination path for generated config files. This enables using it within automated processes, and makes it possible to use it to generate Slonik scripts for tests in the "test bed," which has the further merit of making tools/configure-replication.sh a regularly-regression-tested tool.
- Fix to bug #15 - where long cluster name (>40 chars) leads to
things breaking when an index name is created that contains
the cluster name.
- Warn upon creating a long cluster name.
- Give a useful exception that explains the cause rather than merely watching index creation fail.
- Fix for bug #19 - added a script to help the administrator
search for any triggers on the database that is the source for
a schema that is to be used to initialize a log shipping node.
The problem is that some/most/sometimes all triggers and rules are likely to need to be dropped from the log shipping node lest they interfere with replication.
- Elimination of custom "xxid" functions
PostgreSQL 8.3 introduces a set of "txid" functions and a "txid_snapshot" type, which eliminates the need for Slony-I to have its own C functions for doing XID comparisons.
Note that this affects the structure of sl_event, and leads to some changes in the coding of the regression tests.
This eliminates the src/xxid directory and contents
- All of the interesting cleanup work is now done in the stored
function, cleanupEvent(interval, boolean).
Interesting side-effect: You can now induce a cleanup manually, which will be useful for testing.
- cleanupEvent now has two parameters, passed in from slon config
parameters:
interval - cleanup_interval (default '10 minutes')
This controls how quickly old events are trimmed out. It used to be a hard-coded value.
Old events are trimmed out once the confirmations are aged by (cleanup_interval).
This then controls when the data in sl_log_1/sl_log_2 can be dropped.
Data in those tables is deleted when it is older than the earliest XID still captured in sl_event.
boolean - cleanup_deletelogs (default 'false')
This controls whether or not we DELETE data from sl_log_1/sl_log_2
By default, we now NEVER delete data from the log tables; we instead use TRUNCATE.
- We now consider initiating a log switch every time cleanupEvent()
runs.
If the call to logswitch_finish() indicates that there was no log switch in progress, we initiate one.
This means that log switches will be initiated almost as often as possible. That's a policy well worth debating :-).
- logswitch_finish() changes a fair bit...
It uses the same logic as in cleanupEvent() to determine if there are any *relevant* tuples left in sl_log_[whatever], rather than (potentially) scanning the table to see if there are any undeleted tuples left.
- At slon startup time, it logs (at SLON_CONFIG level) all of the parameter values. Per Bugzilla entry #21.
- New slonik "CLONE PREPARE" and "CLONE FINISH" command to assist in creating duplicate nodes based on taking a copy of some existing subscriber node.
- We no longer use LISTEN/NOTIFY for events and confirmations, which eliminates the usage that has caused pg_listener bloat. We instead poll against the event table.
- Various instances where slonik would use a default node ID of 1 have
been changed to remove this.
Slonik scripts may need to be changed to indicate an EVENT NODE (or similar) after migration to v2.0 as a result.
The slonik commands involved:
- STORE NODE - EVENT NODE
- DROP NODE - EVENT NODE
- WAIT FOR EVENT - WAIT ON
- FAILOVER - BACKUP NODE
- EXECUTE SCRIPT - EVENT NODE
- Fixed a problem where ACCEPT_SET would wait for the corresponding MOVE_SET or FAILOVER_SET to arrive while holding an exclusive lock on sl_config_lock, preventing the other remote worker to process that event.
- Fix typo in configure-replication.sh (missing CR)
- Per bug #35,
search the Slony share dir for scripts before falling back to the PG
share dir on 8.0+
This has resulted in quite a lot of discussion on bug #35; we need agreement that this change is an apropos way to go...
- Change test framework to write out the test name into $TEMPDIR/TestName
- Patch that seems to resolve a race condition with ACCEPT_SET
- Fix bug #49 - mishandling by slony_logshipper of quotes and backslashes.
- Fix bug #50 - slony_logshipper had a variable access after memory was freed
There is no change to the source code tarball.
- Fixed problem with compatibility with PostgreSQL 8.3; function
typenameTypeId() has 3 arguments as of 8.3.
This now allows Slony-I to work with the PostgreSQL 8.3.0 release.
- Added in logic to ensure that max # of SYNCs grouped together is actually constrained by config parameter sync_group_maxsize.
- Fix to show_slony_configuration - point to proper directory where slon/slonik are actually installed.
- Fix to slonik Makefile + slonik.c - Change slonik build to query Postgres for the share directory at runtime - per Dave Page
- Removed spurious NOTIFY on "_%s_Confirm"; this is no longer needed in the 1.2 branch, as there is no LISTEN on this notification. Noted in bug #32
Many thanks to Command Prompt staff for installing and configuring a Bugzilla instance for use in tracking Slony-I bugs and feature requests.
Feature and bug lists will be migrating into this over the next little while...
Version 1.2.12 is now released
- Fixed problem with DDL SCRIPT parser where C-style comments were not being processed properly
- Added stored functions and documentation for adding empty tables (notably *partitions*) to replication. Note these functions do no work when not specifically requested. CAVEAT: This functionality may not work as expected on versions of PostgreSQL earlier than 8.1. Mind you, partitioning tends to function pretty poorly in earlier versions of PostgreSQL as there were substantial enhancements in 8.1 and following versions.
- Added a fairly substantial partitioning test to exercise the new stored functions above.
- Backport "listen path" generator function from CVS HEAD (2.0) to 1.2 branch.
- Fixed a problem with "EXECUTE SCRIPT" (introduced in remote_worker.c version 1.124.2.13) where moving the relevant code into a subroutine at the end led to losing the "BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;" query that needs to be the first thing run...
- Fixing the archive sequence generations (in log shipping). All non-SYNC events must start the local transaction before creating the archive as well, so that the lock on the archive counter table serializes archive creation.
- Fixed logging done in local_listener.c - various places, there was no '\n' in some cases, which would lead to entries being folded together.
- Fix launch_slons.sh - was not stripping quotes from PID file name
- Error handling for "ERROR: could not serialize access due to
concurrent update"
If this error is encountered when starting processing of sl_archive_counter, then two threads are fighting over access to this counter, and at least one has just failed.
Rather than waiting, we ask to restart the node immediately.
- Fixes to slonik_build_env script - it wasn't properly handling cases where there was just 1 table or 1 sequence, and had a problem with the -schema option - thanks, Bernd Helmle
- Don't bother building slony_logshipper on Win32 as it doesn't work there at this point.
- If slonik connects as other than a superuser, then generate error message indicating this to the user.
Fixes are the following:
- Add in tools/mkservice scripts previously added to CVS HEAD
- During subscription, do UPDATE to pg_class.relhasindex *after* the TRUNCATE because, in 8.2+, TRUNCATE resets this attribute
- Fixed a problem with the setsync tracking with Log Shipping in cases where slon does an internal restart (thereby rereading the pset.ssy_seqno) and ignoring non-SYNC events because those don't change the sl_setsync table.
- More explicit type casting of text objects for compatibility with PostgreSQL 8.3
- Fixed problem with DDL SCRIPT statement parser: it wasn't 'quoting' semicolons inside parentheses (this notably occurs in CREATE RULE).
- Fixed problem with DDL SCRIPT statement submission; it was interpreting the statement as a format string, which would have ill effects in the presence of things that are interpreted such as format strings (%d, %f, %s) and \backslashed things like \\, \n.
- Further DDL Script issue: non-terminated statement at the end (e.g. - without trailing semicolon ";") would get omitted.
- Typo fix: when trying to disable a node, the logs would report "enableNode" rather than "disableNode". Fixed.
- Add usage/version options to help output in slon.
- Fix archive logging for replicated sequences.
- Fix to log shipping - added another table, sl_archive_counter where the log writing slon simply tracks when it wrote the last offline archive file and maintains a counter. This counter is now tracked in the offline replica and must increment gap free.
- Change the filenames of archive logs to be based on internal archive tracking number. This makes it easy for the mechanism applying archives to figure out what needs to be applied next - just look in sl_archive_tracking.
- Fix log shipping test to accomodate the new tracking scheme, and update documentation to describe this better.
./configure make rpmFor systems reasonably similar to Fedora and RHAS, this will generate .rpm files that will be compatible with your system.