Wed Dec 1 10:42:02 PST 2010
- Previous message: [Slony1-general] server crash on slave: slon-start to do catch-up VS re-subscribe
- Next message: [Slony1-general] Purging Slony
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Vick Khera <vivek at khera.org> writes: > On Wed, Dec 1, 2010 at 9:33 AM, Mark Steben <msteben at autorevenue.com> wrote: >> Should I: >> 1. issue a ./slon_start and allow slony to catch-up two weeks worth of >> updates >> Or >> 2. Punt, start from scratch, redefine everything and rerun the 16 hour >> subscription process? > > It all depends on the rate of change in your database. Do you do a lot > of insert/update/deletes to the DB? Take a look at the number of rows > in the sl_log_1 and sl_log_2 tables in your replication schema on the > master. If the number is very very high, say in the tens or hundreds > of milliions, then perhaps restarting from scratch may be helpful. If > it is in the low tens of millions, I'd venture to day you could > recover by just restarting slony. One optimization may be to drop all > indexes (except the PK index) on the replica until it is caught up. > This will reduce the I/O it needs to apply the changes. It's uncertain what the bottleneck will be; that may well depend on local characteristics, such that it may be a mistaken assumption to assume that saving on index writes is material. I think something would be learned by simply letting Slony catch up. There are some interesting open questions as to the pathologies when there's truly a lot of data in sl_log_1/2. I don't imagine it would take too terribly long to figure out if things are catching up, between: a) Watching the sl_status view to verify that the lag times are falling, and b) Grepping the subscriber's logs for the following log lines: slon_log(SLON_DEBUG1, "remoteHelperThread_%d_%d: inserts=%d updates=%d deletes=%d truncates=%d\n", node->no_id, provider->no_id, pm.num_inserts, pm.num_updates, pm.num_deletes, pm.num_truncates); slon_log(SLON_INFO, "remoteWorkerThread_%d: SYNC " INT64_FORMAT " done in %.3f seconds\n", node->no_id, event->ev_seqno, TIMEVAL_DIFF(&tv_start, &tv_now)); If, after a few hours, things aren't catching up, it should be easy enough to drop and resubscribe. Something we want to do (and Jan has on his todo list) is to see what pathologies fall out for the query that pulls sl_log_* data. That'll involve adding some extra logging, such as submitting an "EXPLAIN" against the relevant query to see how pricey it appears to be. -- "cbbrowne","@","afilias.info" Christopher Browne "Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock phasers on the Heffalump, Piglet, meet me in transporter room three"
- Previous message: [Slony1-general] server crash on slave: slon-start to do catch-up VS re-subscribe
- Next message: [Slony1-general] Purging Slony
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list