Tue Jun 17 11:08:04 PDT 2008
- Previous message: [Slony1-general] Upgrading from postgres 8.2.3 to 8.3.1
- Next message: [Slony1-general] Making Slony lazy? (= encouraging it to sync less frequently in bigger blocks)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I just committed a small fix to the remote worker. The bug was actually
revealed after a change I made to the ducttape test #2. I added wait for
event commands there in order to start subscribing node 3, which
cascades from node 2, as soon as node 2 had finished its copy set.
The problem was that node 3 as a "not subscribed to anything at all"
node was listening on node 1 for events originating from node 1. That is
fine under normal circumstances. However, in this specific setup the
attempt is to subscribe a set, originating on 1, cascaded with node 2 as
data provider which at this point is for sure lagging behind (it just
started to catch up after the copy set). What happens is that the
SUBSCRIBE_SET event originates on node 2 (data provider) and travels to
node 1 (origin). There it causes the ENABLE_SUBSCRIPTION event to be
generated. This event is received by node 3 "directly", which causes
node 3 to wait and check in 5 second intervals if node 2 finally has
caught up to at least that ENABLE_SUBSCRIPTION event.
In that wait loop, it never processed any confirm forward messages,
which were added to the end of the internal message loop. I changed a
few things to make sure that confirm forward messages are kept at the
head of the remote worker internal message queue.
There have been repeated comments that wait for event does not work in
connection with subscribe set. This bug may have been one, the other
might be that people don't realize that subscribing to a set internally
does create two events, and both need to be waited for in the right order.
The correct sequence of slonik commands to wait for a subscribe is:
subscribe set (...);
wait for event (origin = <data provider>, confirmed = <set origin>,
wait on = <set origin>, timeout = 0);
sync (id = <set origin>);
wait for event (origin = <set origin>, confirmed = <new subscriber>,
wait on = <new subscriber>, timeout = 0);
The first "wait for event" waits until the actual subscribe set command
has been processed by the origin on the data set. The following "sync"
command is necessary to update slonik's idea of what the last event
sequence on the set origin is. The second "wait for event" now will wait
until that very sync has been confirmed by the new subscriber, which
means that it has finished not only the copy set, but also the very
first sync operation thereafter.
The "wait for event" has a timeout. In case of subscribe set operations,
which are known to lead to hours or in some cases even days of lag, such
timeout is for sure unwanted. It is disabled with timeout=0.
Jan
--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin
- Previous message: [Slony1-general] Upgrading from postgres 8.2.3 to 8.3.1
- Next message: [Slony1-general] Making Slony lazy? (= encouraging it to sync less frequently in bigger blocks)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list