Tue Nov 9 13:27:49 PST 2010
- Previous message: [Slony1-general] Sl_log table is huge, over 100 million rows
- Next message: [Slony1-general] Sl_log table is huge, over 100 million rows
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
sharadov <sreddy at spark.net> writes: > We have slony replication set up, and the replication on the slave has > fallen behind by 10 days. On investigating I noticed that the sl_log_1 table > has 25K records, but the sl_log_2 table has over 100 million rows, and they > keep going up. How do I go about troubleshooting this? > > I am a newbie to slony, and would appreciate all the help that I can get You should consider running the "test_slony_state" script which pokes at various parts of the configuration with a view to seeing what might be wrong. <http://slony.info/documentation/2.0/monitoring.html> Some questions... - Why didn't you notice for 10 days? Presumably monitoring hasn't been done right. I'd suggest running test_slony_state on an hourly basis; it complains only if something seems broken... - Are the slon processes running? Usually /usr/bin/ps can help find them... - Is the slon for the subscriber actually replicating data? You should search in the slon logs for the subscriber for lines looking like: DEBUG2: remoteWorkerThread_%d: SYNC %d done in %.3f seconds DEBUG2: remoteWorkerThread_%d_d: inserts=%d updates=%d deletes=%d That should give you an idea as to whether replication work is actually taking place. If it's running into errors before doing real work, then there's some problem that need to get rectified. There's a somewhat "worst case scenario" where if there are way too many events to process, and a timeout gets exceeded: ERROR: remoteListenThread_%d: timeout for event selection This means that the listener thread (src/slon/remote_listener.c) timed out when trying to determine what events were outstanding for it. This could occur because network connections broke, in which case restarting the slon might help. Alternatively, this might occur because the slon for this node has been broken for a long time, and there are an enormous number of entries in sl_event on this or other nodes for the node to work through, and it is taking more than slon_conf_remote_listen_timeout seconds to run the query. In older versions of Slony-I, that configuration parameter did not exist; the timeout was fixed at 300 seconds. In newer versions, you might increase that timeout in the slon config file to a larger value so that it can continue to completion. And then investigate why nobody was monitoring things such that replication broke for such a long time... If this proves to be the problem, then you can change the listen timeout to something rather larger than 300 seconds. And hopefully the slon can get past the too-many-events problem. -- "cbbrowne","@","ca.afilias.info" Christopher Browne "Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock phasers on the Heffalump, Piglet, meet me in transporter room three"
- Previous message: [Slony1-general] Sl_log table is huge, over 100 million rows
- Next message: [Slony1-general] Sl_log table is huge, over 100 million rows
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list