[Slony1-general] 1 cluster or 5?

Tue Aug 9 09:49:42 PDT 2011

On Tue, Aug 9, 2011 at 7:37 AM, Stuart Bishop <stuart at stuartbishop.net> wrote:
> I need to replicate some new databases.
>
> We have 5 shards, each with an identical schema. Each of these will be
> replicated to a slave (5 masters, 5 slaves).
>
> Do I need 5 separate clusters, or can I do this with a single cluster
> and 5 replication sets? I'll be using PG 8.4 and Slony-I 2.0.7.
>
> I suspect I need 5 separate clusters, and I'm less likely to blow my
> foot off with this setup, but I'm soliciting opinions before I proceed
> :-)

By "identical schema," does that mean you have 5 databases with
identical schemas:

for db in db1 db2 db3 db4 db5; do
createdb ${db}; psql -d ${db} -c "create table public.t1 (id serial
primary key);"
done

Or one database, with several namespaces, with identical schemas within that:
createdb db
for i in sc1 sc2 sc3 sc4 sc5; do
psql -d db -c "create namespace ${i};"
psql -d db -c "create table ${i}.t1 (id serial primary key);"
done

If it looks like the latter, where the schemas are distinguishable by
virtue of namespace, then it would be pretty reasonable to have 1
cluster with 5 replication sets.

if, on the other hand, it looks like the former, where it's only the
database name that tells things apart, I'd be not too keen to try to
have just one big cluster, as there's a pretty big "foot-gun."

Let me describe the shape of the "foot-gun"...

The problem that can occur here is, since the names of tables
otherwise match, you could accidentally alter subscriptions in such a
way as to have cross-talk between sets, and accidentally have "set #5"
pulling data from "set #2", thereby mixing things in a way that you'd
probably have hoped would be immiscible.

Mind you, it shouldn't be possible to have subscriptions to "node 1's
public.t1" and "node 3's public.t1" subscribed to anywhere
simultaneously, because:
 i) Postgres only allows one table to be called public.t1 in the
subscriber database (let's say it's db6), and this combines with...
 ii) Slony requires an sl_table entry for each time a table gets
subscribed, and has a unique index on sl_table.tab_reloid.
That should nicely prevent "crosstalk."  You can't have 2
subscriptions involving the same target table on a node at the same
time.  I think that means you're reasonably protected from corruption
of replication for this kind of case.

But it doesn't prevent an onlooker from getting confused, and that's
the factor I'd put higher on my list.

One of the "nice to have" features would be the ability to rename
tables on a subscriber.  That would mean that the schema could be, on
the "master" nodes, as described in that first "for" loop, but then,
on a subscriber, the schema might look like what's in the second "for"
loop.  In that case, it's quite possible that you'd have just 6 nodes
in the cluster:
a) Five "master" nodes for the 5 systems, and
b) One "subscriber" that consolidates the data from all 5 systems.

The trouble with this feature is less about doing it and more about
coming up with a parsimonious way to describe the configuration.