Tue Nov 23 08:53:33 PST 2010
- Previous message: [Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists
- Next message: [Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Steve Singer <ssinger at ca.afilias.info> writes: > On 10-11-23 09:48 AM, Vick Khera wrote: >> On Tue, Nov 23, 2010 at 9:31 AM, Steve Singer<ssinger at ca.afilias.info> wrote: >>> Slony can get into a state where it can't keep up/catch up with >>> replication because the sl_log table is so large. >>> >>> >>> Does this problem bite people often enough in the real world for us to >>> devote effort to fixing? >>> >> >> It used to happen to me a lot when I had my origin running on spinning >> media. Ever since I moved to an SSD, it doesn't really happen. At >> worst when I do a large delete I fall behind by a few minutes but it >> catches up quickly. For me, it didn't even require taking the DB down >> for any extended period.. just running a large update or delete that >> touched many many rows (ie, generated a lot of events in sl_log) could >> send the system into a tailspin that would take hours or possibly days >> (until we hit a weekend) to recover. >> >> I am not sure it was caused by the log being too big... because >> sometimes reindexing the tables on the replica would clear up the >> backlog quickly too. But I may be sniffing down the wrong trail. >> > > The other place this will hit busy systems is during the initial sync. > If your database is very large (or very busy) a lot of log rows can > accumulate while that initial sync is going on. OMIT_COPY doesn't help > you because it requires an outage to get the master and slave in sync > (just the loading time on a 1TB database is a while). > > CLONE PREPARE/FINISH also aren't of help because a) these only work if > you already have at least 1 subscriber setup and b) after you do the > clone prepare any later transactions still need to be kept in sl_log > until the new slave is up and running. I'm not sure that we gain much by splitting the logs into a bunch of pieces for that case. It's still the same huge backlog, and until it gets worked down, it's bloated, period. A different suggestion, that doesn't involve any changes to Slony... Initially, it might be a good idea to set up the new subscriber with FORWARD=no... subscribe set (id=1, provider=1, receiver=2,forward=no); That means that log data won't get captured in sl_log_(1|2) on the subscriber for a while, while the subscription is catching up. Once it's reasonably caught up, you submit: subscribe set (id=1, provider=1, receiver=2,forward=yes); which turns that logging on, so that the new node becomes a failover target and a legitimate target to feed other subscriptions. While node #2 is catching up, it's a crummy candidate for a failover target, so while this strategy *altogether* loses the ability for it to be a failover target while catching up, remember that it was moving from 18-ish hours behind towards caught up, which represented a crummy failover target. I don't think something hugely useful is being lost, here. -- select 'cbbrowne' || '@' || 'ca.afilias.info'; Christopher Browne "Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock phasers on the Heffalump, Piglet, meet me in transporter room three"
- Previous message: [Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists
- Next message: [Slony1-general] Feature Idea: improve performance when a large sl_log backlog exists
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list