7. Defining Slony-I Replication Sets

Defining the nodes indicated the shape of the cluster of database servers; it is now time to determine what data is to be copied between them. The groups of data that are copied are defined as "replication sets."

A replication set consists of the following:

7.1. Primary Keys

Slony-I needs to have a primary key or candidate thereof on each table that is replicated. PK values are used as the primary identifier for each tuple that is modified in the source system. Note that they can be composite keys composed of multiple NOT NULL columns; they don't need to consist of single fields. There are three ways that you can get Slony-I to use a primary key:

It is not terribly important whether you pick a "true" primary key or a mere "candidate primary key;" it is, however, strongly recommended that you have one of those instead of having Slony-I populate the PK column for you. If you don't have a suitable primary key, that means that the table hasn't got any mechanism, from your application's standpoint, for keeping values unique. Slony-I may, therefore, introduce a new failure mode for your application, and this also implies that you had a way to enter confusing data into the database.

7.2. Grouping tables into sets

It will be vital to group tables together into a single set if those tables are related via foreign key constraints. If tables that are thus related are not replicated together, you'll find yourself in trouble if you switch the "master provider" from one node to another, and discover that the new "master" can't be updated properly because it is missing the contents of dependent tables.

There are also several reasons why you might not want to have all of the tables in one replication set:

7.3. The Pathology of Sequences

Each time a SYNC is processed, values are recorded for all of the sequences in the set. If there are a lot of sequences, this can cause sl_seqlog to grow rather large.

This points to an important difference between tables and sequences: if you add additional tables that do not see much/any activity, this does not add any material load to the work being done by replication. For a replicated sequence, values must regularly be propagated to subscribers. Consider the effects: