Tue Nov 9 17:09:26 PST 2010
- Previous message: [Slony1-general] Sl_log table is huge, over 100 million rows
- Next message: [Slony1-general] Error when delete replication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'll start working through what you suggested. Christopher Browne wrote: > > sharadov <sreddy at spark.net> writes: >> We have slony replication set up, and the replication on the slave has >> fallen behind by 10 days. On investigating I noticed that the sl_log_1 >> table >> has 25K records, but the sl_log_2 table has over 100 million rows, and >> they >> keep going up. How do I go about troubleshooting this? >> >> I am a newbie to slony, and would appreciate all the help that I can get > > You should consider running the "test_slony_state" script which pokes at > various parts of the configuration with a view to seeing what might be > wrong. > > <http://slony.info/documentation/2.0/monitoring.html> > > Some questions... > > - Why didn't you notice for 10 days? > > Presumably monitoring hasn't been done right. I'd suggest running > test_slony_state on an hourly basis; it complains only if something > seems broken... > > - Are the slon processes running? > > Usually /usr/bin/ps can help find them... > > - Is the slon for the subscriber actually replicating data? > > You should search in the slon logs for the subscriber for lines > looking like: > > DEBUG2: remoteWorkerThread_%d: SYNC %d done in %.3f seconds > DEBUG2: remoteWorkerThread_%d_d: inserts=%d updates=%d deletes=%d > > That should give you an idea as to whether replication work is > actually taking place. > > If it's running into errors before doing real work, then there's > some problem that need to get rectified. > > There's a somewhat "worst case scenario" where if there are way too > many events to process, and a timeout gets exceeded: > > ERROR: remoteListenThread_%d: timeout for event selection > > This means that the listener thread (src/slon/remote_listener.c) > timed out when trying to determine what events were outstanding for > it. > > This could occur because network connections broke, in which case > restarting the slon might help. > > Alternatively, this might occur because the slon for this node has > been broken for a long time, and there are an enormous number of > entries in sl_event on this or other nodes for the node to work > through, and it is taking more than slon_conf_remote_listen_timeout > seconds to run the query. In older versions of Slony-I, that > configuration parameter did not exist; the timeout was fixed at 300 > seconds. In newer versions, you might increase that timeout in the > slon config file to a larger value so that it can continue to > completion. And then investigate why nobody was monitoring things > such that replication broke for such a long time... > > If this proves to be the problem, then you can change the listen > timeout to something rather larger than 300 seconds. And hopefully > the slon can get past the too-many-events problem. > -- > "cbbrowne","@","ca.afilias.info" > Christopher Browne > "Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock > phasers on the Heffalump, Piglet, meet me in transporter room three" > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > > -- View this message in context: http://old.nabble.com/Sl_log-table-is-huge%2C-over-100-million-rows-tp30173901p30176891.html Sent from the Slony-I -- General mailing list archive at Nabble.com.
- Previous message: [Slony1-general] Sl_log table is huge, over 100 million rows
- Next message: [Slony1-general] Error when delete replication
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list