Wed Aug 16 09:43:40 PDT 2006
- Previous message: [Slony1-general] high load on all nodes
- Next message: [Slony1-general] Slony-I build errors on Solaris 9
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for the suggestions! I've determined that it's the second idea....killall -STOP slon (didn't run a sig 2) will bring the load down to about 0.8-1.2 every time I do it. Then, when I do killall -CONT slon, load climbs back up to about 6.00 Guess I need a bigger machine? --Richard On Aug 10, 2006, at 8:53 PM, cbbrowne at ca.afilias.info wrote: >> Hi all, >> >> I'm running a master with two slaves, and one of the slaves forwards >> to an offsite node (so, 4 nodes in all). Each node is replicating 3 >> databases >> >> Since finishing the setup, I've been experiencing high load on all 4 >> nodes, and I'm not sure what's causing the problem. A quick glance >> at 'top' shows that the top cpu-consuming processes are postgres >> processes run by the slon daemon. A handful of them say "notify >> interrupt waiting" >> >> Any ideas why the load is so high (anywhere from 4.00 to 6.00)? >> Before doing this setup, I've experienced loads around 0.80 to 1.70 >> >> Any insight would be greatly appreciated! >> --Richard > > With a large number of processes waiting, that explains the > apparent high > load average. They don't have to be working terribly hard for you > to have > a big queue of waiting processes. > > The one thing that leaps to mind as plausible cause for them to be > working > hard would be if the table pg_listener has grown to significant > size, and > the notification system is waiting on it. > > You might try running, on various of the databases: > VACUUM FULL VERBOSE pg_catalog.pg_listener; > > If that has any evident effects, such as reporting that the table > shrunk > from thousands of pages in size to near nothing, then that suggests > undervacuuuming of pg_listener as a direct cause of the problem. > > Second thought... Consider stopping (sig 2, initially, to allow DB > connections to be closed as cleanly as possible) and restarting all > the > slon processes, perhaps followed by vacuuming pg_listener... > Recycling > the database connections (which is the result of this) would be > expected > to clear notification/listen activity.... > > Third thought / suspicion... If you shut down the slons for a few > seconds, check to see if any DB connections remain. It's possible > that > your problem is that the network is a bit unreliable, and you're > experiencing "zombied" connections. That is, a remote connection > falls > over but the PostgreSQL back end doesn't become aware of this for > up to > about 2 hours. > > In that case, you're left with a barrel of basically useless > connections, > possibly thinking they're in a transaction. Kill off those backends > (signal -2), restart slons, and that should clear things up, at least > until the network flakes again... > > I think I most anticipate it's #3. Checking them in order should > be easy > enough, and the earlier steps won't preclude taking later ones... > > If you have a bunch of zombied connections, shutting off the "live" > slons > won't touch them... > > >
- Previous message: [Slony1-general] high load on all nodes
- Next message: [Slony1-general] Slony-I build errors on Solaris 9
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list