Slony-I Upgrade

23. Slony-I Upgrade

When upgrading Slony-I, the installation on all nodes in a cluster must be upgraded at once, using the slonik command SLONIK UPDATE FUNCTIONS.

While this requires temporarily stopping replication, it does not forcibly require an outage for applications that submit updates.

The proper upgrade procedure is thus:

  • Stop the slon processes on all nodes. (e.g. - old version of slon)

  • Install the new version of slon software on all nodes.

  • Execute a slonik script containing the command update functions (id = [whatever]); for each node in the cluster.

    Note: Remember that your slonik upgrade script like all other slonik scripts must contain the proper preamble commands to function.

  • Start all slons.

The overall operation is relatively safe: If there is any mismatch between component versions, the slon will refuse to start up, which provides protection against corruption.

You need to be sure that the C library containing SPI trigger functions has been copied into place in the PostgreSQL build. There are multiple possible approaches to this:

The trickiest part of this is ensuring that the C library containing SPI functions is copied into place in the PostgreSQL build; the easiest and safest way to handle this is to have two separate PostgreSQL builds, one for each Slony-I version, where the postmaster is shut down and then restarted against the "new" build; that approach requires a brief database outage on each node.

While that approach has been found to be easier and safer, nothing prevents one from carefully copying Slony-I components for the new version into place to overwrite the old version as the "install" step. That might not work on Windows™ if it locks library files that are in use.

Run make install to install new Slony-I components on top of the old

If you build Slony-I on the same system on which it is to be deployed, and build from sources, overwriting the old with the new is as easy as make install. There is no need to restart a database backend; just to stop slon processes, run the UPDATE FUNCTIONS script, and start new slon processes.

Unfortunately, this approach requires having a build environment on the same host as the deployment. That may not be consistent with efforts to use common PostgreSQL and Slony-I binaries across a set of nodes.

Create a new PostgreSQL and Slony-I build

With this approach, the old PostgreSQL build with old Slony-I components persists after switching to a new PostgreSQL build with new Slony-I components. In order to switch to the new Slony-I build, you need to restart the PostgreSQL postmaster, therefore interrupting applications, in order to get it to be aware of the location of the new components.

23.1. TABLE ADD KEY issue in Slony-I 2.0

Usually, upgrades between Slony-I versions have required no special attention to the condition of the existing replica. That is, you fairly much merely need to stop slons, put new binaries in place, run SLONIK UPDATE FUNCTIONS against each node, and restart slons. Schema changes have been internal to the cluster schema, and SLONIK UPDATE FUNCTIONS has been capable to make all of the needed alterations. With version 2, this changes, if there are tables that used SLONIK TABLE ADD KEY. Version 2 does not support the "extra" column, and "fixing" the schema to have a proper primary key is not within the scope of what SLONIK UPDATE FUNCTIONS can perform.

When upgrading from versions 1.0.x, 1.1.x, or 1.2.x to version 2, it will be necessary to have already eliminated any such Slony-I-managed primary keys.

One may identify the tables affected via the following SQL query: select n.nspname, c.relname from pg_class c, pg_namespace n where c.oid in (select attrelid from pg_attribute where attname like '_Slony-I_%rowID' and not attisdropped) and reltype <> 0 and n.oid = c.relnamespace order by n.nspname, c.relname;

The simplest approach that may be taken to rectify the "broken" state of such tables is as follows:

  • Drop the table from replication using the slonik command SLONIK SET DROP TABLE.

    This does not drop out the Slony-I-generated column.

  • On each node, run an SQL script to alter the table, dropping the extra column.

    alter table whatever drop column "_Slony-I_cluster-rowID";

    This needs to be run individually against each node. Depending on your preferences, you might wish to use SLONIK EXECUTE SCRIPT to do this.

    If the table is a heavily updated one, it is worth observing that this alteration will require acquiring an exclusive lock on the table. It will not hold this lock for terribly long; dropping the column should be quite a rapid operation as all it does internally is to mark the column as being dropped; it does not require rewriting the entire contents of the table. Tuples that have values in that column will continue to have that value; new tuples will leave it NULL, and queries will ignore the column. Space for those columns will get reclaimed as tuples get updated.

    Note that at this point in the process, this table is not being replicated. If a failure takes place, replication is not, at this point, providing protection on this table. This is unfortunate but unavoidable.

  • Make sure the table has a legitimate candidate for primary key, some set of NOT NULL, UNIQUE columns.

    The possible variations to this are the reason that the developers have made no effort to try to assist automation of this.

  • If the table is a small one, it may be perfectly reasonable to do alterations (note that they must be applied to every node!) to add a new column, assign it via a new sequence, and then declare it to be a primary key.

    If there are only a few tuples, this should take a fraction of a second, and, with luck, be unnoticeable to a running application.

    Even if the table is fairly large, if it is not frequently accessed by the application, the locking of the table that takes place when you run ALTER TABLE may not cause much inconvenience.

  • If the table is a large one, and is vital to and heavily accessed by the application, then it may be necessary to take an application outage in order to accomplish the alterations, leaving you necessarily somewhat vulnerable until the process is complete.

    If it is troublesome to take outages, then the upgrade to Slony-I version 2 may take some planning...

This approach should be fine for tables that are relatively small, or infrequently used. If, on the other hand, the table is large and heavily used, another approach may prove necessary, namely to create your own sequence, and "promote" the formerly Slony-I-generated column into a "real" column in your database schema. An outline of the steps is as follows:

  • Add a sequence that assigns values to the column.

    Setup steps will include SQL CREATE SEQUENCE, SQL SELECT SETVAL() (to set the value of the sequence high enough to reflect values used in the table), Slonik SLONIK CREATE SET (to create a set to assign the sequence to), Slonik SLONIK SET ADD SEQUENCE (to assign the sequence to the set), Slonik SLONIK SUBSCRIBE SET (to set up subscriptions to the new set)

  • Attach the sequence to the column on the table.

    This involves ALTER TABLE ALTER COLUMN, which must be submitted via the Slonik command SLONIK EXECUTE SCRIPT.

  • Rename the column _Slony-I_@CLUSTERNAME@_rowID so that Slony-I won't consider it to be under its control.

    This involves ALTER TABLE ALTER COLUMN, which must be submitted via the Slonik command SLONIK EXECUTE SCRIPT.

    Note that these two alterations might be accomplished via the same SLONIK EXECUTE SCRIPT request.

23.2. New Trigger Handling in Slony-I Version 2

One of the major changes to Slony-I is that enabling/disabling of triggers and rules now takes place as plain SQL, supported by PostgreSQL 8.3+, rather than via "hacking" on the system catalog.

As a result, Slony-I users should be aware of the PostgreSQL syntax for ALTER TABLE, as that is how they can accomplish what was formerly accomplished via SLONIK STORE TRIGGER and SLONIK DROP TRIGGER.