Mon May 10 15:07:25 PDT 2010
- Previous message: [Slony1-general] Need Urgent Help
- Next message: [Slony1-general] Drop Node command works, but leaves crumbs behind.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all,
I've been running into a problem with dropping a node from the slony
cluster, in which the slony system catalogs aren't getting fully cleaned
up upon the dropping of the node.
I have a three node cluster, one master and two slaves. I have a
script that will generate the slonik command that will drop one of the
slaves (in this case node three) from the slony cluster and it executes
without problem. However, after preforming the drop node a few dozen
times, there have been several instances in which the data in
_slony.sl_status still refers to a third node, and the st_lag_num_events
climb and climb (since there's no node to sync with, it will never drop
to 0).
So the problem is after I drop a node, everything looks great except for
the _slony.sl_status table, in any or all the remaining nodes, still
refers to the node that was just dropped.
I did quite a few test runs of the drop node to try to reproduce and
determine the cause. After the drop node, if I look in sl_node, sl_path,
sl_event, or any other sl_ location, I see no reference to the third
node. However, about half the time I would still get references to the
third node in sl_status. This can either be on the master node, or the
(remaining) slave node, or both. There was one test scenario that I
monitored the sl_status table and noticed that node 3 disappeared, then
reappeared a second later, then remained.
Example queries done on node 2 (slave) after dropping node 3 (other slave):
postgres=# select * from _slony.sl_node;
no_id | no_active | no_comment | no_spool
-------+-----------+------------+----------
1 | t | Server 1 | f
2 | t | Server 2 | f
(2 rows)
postgres=# select * from _slony.sl_path ;
pa_server | pa_client |
pa_conninfo | pa_connretry
-----------+-----------+------------------------------------------------------------+--------------
1 | 2 | dbname=postgres host=172.16.44.111 port=5432
user=postgres | 10
2 | 1 | dbname=postgres host=172.16.44.129 port=5432
user=postgres | 10
(2 rows)
postgres=# select * from _slony.sl_status;
st_origin | st_received | st_last_event | st_last_event_ts |
st_last_received | st_last_received_ts |
st_last_received_event_ts | st_lag_num_events | st_lag_time
-----------+-------------+---------------+----------------------------+------------------+----------------------------+----------------------------+-------------------+-----------------
2 | 1 | 1649 | 2010-05-10 15:53:16.245529
| 1649 | 2010-05-10 15:53:16.246212 | 2010-05-10
15:53:16.245529 | 0 | 00:00:05.57205
2 | 3 | 1656 | 2010-05-10 15:54:26.280131
| 1636 | 2010-05-10 15:51:05.341512 | 2010-05-10
15:51:05.343754 | 20 | 00:03:22.66664
Also, another problem that may be linked is the fact that the slon
daemon for node 3 does not terminate itself after it. Watching the log
output by that daemon, it shows that it recieves the drop node command
for itself, and it drops the _slony schema as intended. However after
that it reports "2010-05-10 15:57:56 MDT FATAL main: Node is not
initialized properly - sleep 10s" and keeps checking every ten seconds.
I'm not sure if somehow this daemon is causing some post-drop-node
entries into the sl_event section that causes the sl_status entry to be
recreated.
In case it helps, here is a copy of the drop node script I'm running.
#!/bin/bash
slonik <<_EOF_
cluster name = slony;
node 1 admin conninfo = ' dbname=postgres host=172.16.44.111 port=5432
user=postgres';
node 2 admin conninfo = ' dbname=postgres host=172.16.44.129 port=5432
user=postgres';
node 3 admin conninfo = ' dbname=postgres host=172.16.44.142 port=5432
user=postgres';
DROP NODE ( ID = 3, EVENT NODE = 1 );
_EOF_
I am running on a CentOS 5, postgres 8.4.2, and slony 1.2.20 environment
on all three nodes.
Thanks in advance,
Brian Fehrle
- Previous message: [Slony1-general] Need Urgent Help
- Next message: [Slony1-general] Drop Node command works, but leaves crumbs behind.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list