Slony-I Listen Paths

4.2. Slony-I Listen Paths

Note: If you are running version Slony-I 1.1 or later it should be completely unnecessary to read this section as it introduces a way to automatically manage this part of its configuration. For earlier versions, however, it is needful.

If you have more than two or three nodes, and any degree of usage of cascaded subscribers (e.g. - subscribers that are subscribing through a subscriber node), you will have to be fairly careful about the configuration of "listen paths" via the Slonik SLONIK STORE LISTEN and SLONIK DROP LISTEN statements that control the contents of the table sl_listen.

The "listener" entries in this table control where each node expects to listen in order to get events propagated from other nodes. You might think that nodes only need to listen to the "parent" from whom they are getting updates, but in reality, they need to be able to receive messages from all nodes in order to be able to conclude that syncs have been received everywhere, and that, therefore, entries in sl_log_1 and sl_log_2 have been applied everywhere, and can therefore be purged. this extra communication is needful so Slony-I is able to shift origins to other locations.

4.2.1. How Listening Can Break

On one occasion, I had a need to drop a subscriber node (#2) and recreate it. That node was the data provider for another subscriber (#3) that was, in effect, a "cascaded slave." Dropping the subscriber node initially didn't work, as slonik informed me that there was a dependant node. I re-pointed the dependant node to the "master" node for the subscription set, which, for a while, replicated without difficulties.

I then dropped the subscription on "node 2", and started resubscribing it. that raised the Slony-I set_subscription event, which started copying tables. at that point in time, events stopped propagating to "node 3", and while it was in perfectly ok shape, no events were making it to it.

The problem was that node #3 was expecting to receive events from node #2, which was busy processing the set_subscription event, and was not passing anything else on.

We dropped the listener rules that caused node #3 to listen to node 2, replacing them with rules where it expected its events to come from node #1 (the origin node for the replication set). At that moment, "as if by magic", node #3 started replicating again, as it discovered a place to get sync events.

4.2.2. How the Listen Configuration Should Look

The simple cases tend to be simple to cope with. We need to instead look at a more complex node configuration.

Consider a set of nodes, 1 thru 6, where 1 is the origin, where 2-4 subscribe directly to the origin, and where 5 subscribes to 2, and 6 subscribes to 5.

Here is a "listener network" that indicates where each node should listen for messages coming from each other node:

       1|   2|   3|   4|   5|   6|
--------------------------------------------
   1   0    2    3    4    2    2
   2   1    0    1    1    5    5
   3   1    1    0    1    1    1
   4   1    1    1    0    1    1
   5   2    2    2    2    0    6
   6   5    5    5    5    5    0

Row 2 indicates all of the listen rules for node 2; it gets events for nodes 1, 3, and 4 through node 1, and gets events for nodes 5 and 6 from node 5.

The row of 5's at the bottom, for node 6, indicate that node 6 listens to node 5 to get events from nodes 1-5.

The set of slonik set listen statements to express this "listener network" are as follows:

store listen (origin = 1, receiver = 2, provider = 1);
store listen (origin = 1, receiver = 3, provider = 1);
store listen (origin = 1, receiver = 4, provider = 1);
store listen (origin = 1, receiver = 5, provider = 2);
store listen (origin = 1, receiver = 6, provider = 5);
store listen (origin = 2, receiver = 1, provider = 2);
store listen (origin = 2, receiver = 3, provider = 1);
store listen (origin = 2, receiver = 4, provider = 1);
store listen (origin = 2, receiver = 5, provider = 2);
store listen (origin = 2, receiver = 6, provider = 5);
store listen (origin = 3, receiver = 1, provider = 3);
store listen (origin = 3, receiver = 2, provider = 1);
store listen (origin = 3, receiver = 4, provider = 1);
store listen (origin = 3, receiver = 5, provider = 2);
store listen (origin = 3, receiver = 6, provider = 5);
store listen (origin = 4, receiver = 1, provider = 4);
store listen (origin = 4, receiver = 2, provider = 1);
store listen (origin = 4, receiver = 3, provider = 1);
store listen (origin = 4, receiver = 5, provider = 2);
store listen (origin = 4, receiver = 6, provider = 5);
store listen (origin = 5, receiver = 1, provider = 2);
store listen (origin = 5, receiver = 2, provider = 5);
store listen (origin = 5, receiver = 3, provider = 1);
store listen (origin = 5, receiver = 4, provider = 1);
store listen (origin = 5, receiver = 6, provider = 5);
store listen (origin = 6, receiver = 1, provider = 2);
store listen (origin = 6, receiver = 2, provider = 5);
store listen (origin = 6, receiver = 3, provider = 1);
store listen (origin = 6, receiver = 4, provider = 1);
store listen (origin = 6, receiver = 5, provider = 6);

How we read these listen statements is thus...

When on the "receiver" node, look to the "provider" node to provide events coming from the "origin" node.

The tool init_cluster in the altperl scripts produces optimized listener networks in both the tabular form shown above as well as in the form of slonik statements.

There are three "thorns" in this set of roses:

  • If you change the shape of the node set, so that the nodes subscribe differently to things, you need to drop sl_listen entries and create new ones to indicate the new preferred paths between nodes. Until Slony-I 1.1;, there is no automated way at this point to do this "reshaping".

  • If you don't change the sl_listen entries, events will likely continue to propagate so long as all of the nodes continue to run well. the problem will only be noticed when a node is taken down, "orphaning" any nodes that are listening through it.

  • you might have multiple replication sets that have different shapes for their respective trees of subscribers. there won't be a single "best" listener configuration in that case.

  • In order for there to be an sl_listen path, there must be a series of sl_path entries connecting the origin to the receiver. this means that if the contents of sl_path do not express a "connected" network of nodes, then some nodes will not be reachable. this would typically happen, in practice, when you have two sets of nodes, one in one subnet, and another in another subnet, where there are only a couple of "firewall" nodes that can talk between the subnets. cut out those nodes and the subnets stop communicating.

4.2.3. Automated Listen Path Generation

In Slony-I version 1.1, a heuristic scheme is introduced to automatically generate sl_listen entries. This happens, in order, based on three data sources:

  • sl_subscribe entries are the first, most vital control as to what listens to what; we know there must be a direct path between each subscriber node and its provider.

  • sl_path entries are the second indicator; if sl_subscribe has not already indicated "how to listen," then a node may listen directly to the event's origin if there is a suitable sl_path entry.

  • Lastly, if there has been no guidance thus far based on the above data sources, then nodes can listen indirectly via every node that is either a provider for the receiver, or that is using the receiver as a provider.

Any time sl_subscribe or sl_path are modified, RebuildListenEntries() will be called to revise the listener paths.