[Slony1-general] Re: Out of memory errors

Mon Nov 21 18:15:56 PST 2005

I agree I would have liked to keep the data and find out,
but at the point I was nearly out of disk space and having
my Email down for my 5-6k customers is not an option.

Which is also why VACUUM FULL isn't an option either.  I
tried that on the slave and it too 2.5 days to complete
all the time, the database was not available.  The slave
database had no other traffic on it at the time.

I had the same VACUUM schema on both servers, and also had
pg_autovacuum running.

I have not given up on slony, but my main concern is having
near instant backup of customer email, which I think PITR and
a large storage device will work better in this case.

I just wondered if others had seen this with a largeish scale
database with large rows.

--
David A. Niblett               | email: niblettda at gru.net
Network Administrator          | Phone: (352) 334-3400
Gainesville Regional Utilities | Web: http://www.gru.net/

-----Original Message-----
From: Christopher Browne [mailto:cbbrowne at ca.afilias.info] 
Sent: Monday, November 21, 2005 12:24 PM
To: Niblett, David A
Cc: Slony-I Mailing List
Subject: Re: [Slony1-general] Re: Out of memory errors

"Niblett, David A" <niblettda at gru.com> writes:
> Just curious here, I emailed the list when I think it was down about a 
> problem I was having with Slony and replicating a large database with 
> large rows.
>
> We are using DBMail for our Email store and I was replicating it to 
> another server with Slony.  The database is ~17G and easily has 
> records in the multi-meg size.
>
> I started having problems where the CPU load was running around 5 
> consistently, and the database was almost 45G in size.  Since 
> everything was becoming unusable I stopped slony, and uninstalled the 
> nodes.  Once I did that, my process load went to <1 at almost all 
> times and my database shrunk back to a more normal 17G size.
>
> Likewise on the slave database it dropped some in size, but a VACUUM 
> FULL was required to get it back to normal.  Also the load on it 
> dropped to almost nothing and more importantly the RAM and Swap went 
> way down on both servers.
>
> So I'm wondering if this particular problem is what was causing me 
> issues with Slony, or if I have something else wrong.  It seem to me 
> that there must be some huge log space that was taking up a lot of 
> space, but when I uninstalled the nodes that went away, so I can't 
> look now.

Clearly _something_ bloated; we can't know now unless you did a VACUUM FULL
VERBOSE, and the results are still sitting in PG logs on your disk...

If VACUUM FULL "resolved the issue," then something clearly wasn't being
vacuumed properly.  If you had some VACUUM VERBOSE output to peek at, we
might be able to discover that it was one table that wasn't being vacuumed
often enough, but presumably we can't, now.

Regrettably, this is something where "details count."  Vacuuming the wrong
table will do no good.

We had a case recently where doing a VACUUM FULL on pg_listener ran for just
a few seconds, and resolved a persistent situation of replication running
slowly and falling behind.  I'd go there, first, as it's easy for that
problem to arise, and it takes only a few seconds to see if it helps.

Personally, I'd be inclined to keep (for weeks/months) the statistics
emitted by VACUUM FULL as that information can be fabulously valuable for
diagnosis purposes, and not just for Slony-I...
-- 
output = ("cbbrowne" "@" "ca.afilias.info")
<http://dev6.int.libertyrms.com/> Christopher Browne
(416) 673-4124 (land)