Thu Aug 10 20:53:55 PDT 2006
- Previous message: [Slony1-general] high load on all nodes
- Next message: [Slony1-general] high load on all nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Hi all, > > I'm running a master with two slaves, and one of the slaves forwards > to an offsite node (so, 4 nodes in all). Each node is replicating 3 > databases > > Since finishing the setup, I've been experiencing high load on all 4 > nodes, and I'm not sure what's causing the problem. A quick glance > at 'top' shows that the top cpu-consuming processes are postgres > processes run by the slon daemon. A handful of them say "notify > interrupt waiting" > > Any ideas why the load is so high (anywhere from 4.00 to 6.00)? > Before doing this setup, I've experienced loads around 0.80 to 1.70 > > Any insight would be greatly appreciated! > --Richard With a large number of processes waiting, that explains the apparent high load average. They don't have to be working terribly hard for you to have a big queue of waiting processes. The one thing that leaps to mind as plausible cause for them to be working hard would be if the table pg_listener has grown to significant size, and the notification system is waiting on it. You might try running, on various of the databases: VACUUM FULL VERBOSE pg_catalog.pg_listener; If that has any evident effects, such as reporting that the table shrunk from thousands of pages in size to near nothing, then that suggests undervacuuuming of pg_listener as a direct cause of the problem. Second thought... Consider stopping (sig 2, initially, to allow DB connections to be closed as cleanly as possible) and restarting all the slon processes, perhaps followed by vacuuming pg_listener... Recycling the database connections (which is the result of this) would be expected to clear notification/listen activity.... Third thought / suspicion... If you shut down the slons for a few seconds, check to see if any DB connections remain. It's possible that your problem is that the network is a bit unreliable, and you're experiencing "zombied" connections. That is, a remote connection falls over but the PostgreSQL back end doesn't become aware of this for up to about 2 hours. In that case, you're left with a barrel of basically useless connections, possibly thinking they're in a transaction. Kill off those backends (signal -2), restart slons, and that should clear things up, at least until the network flakes again... I think I most anticipate it's #3. Checking them in order should be easy enough, and the earlier steps won't preclude taking later ones... If you have a bunch of zombied connections, shutting off the "live" slons won't touch them...
- Previous message: [Slony1-general] high load on all nodes
- Next message: [Slony1-general] high load on all nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list