Mon Jan 10 20:09:09 PST 2005
- Previous message: [Slony1-general] [PATCH] Add --config and --help to more tools/altperl scripts
- Next message: [Slony1-general] Slony performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greetings all, sorry for the long post, but hopefully you can shed some
light on my dilemma here:
Last Friday I started replication on our production servers after an
extensive playing period on some test servers, and everything seemed to
be fine before i left for the weekend, the data set had apparently
finished copying over from the subscribe set etc. and things seemed to
be humming along just fine.
I came into work this morning and found the master db server to be dog
slow (load average had been above 10 since Saturday sometime). After a
bit of investigation I think that the slony replication service is
causing it.
postgres 19983 22.8 3.4 83488 69508 ? S Jan07 1140:0 postgres:
postgres pl ::ffff:216.239.10.115 FETCH
postgres 20073 23.5 3.4 83480 69524 ? S Jan07 1169:57
postgres: postgres pl ::ffff:216.239.10.116 FETCH
>From postgres I found that those 2 processes are running the following
queries:
19983 | fetch 100 from LOG;
20073 | fetch 100 from LOG;
Which I'm guessing is slony grabbing things from the transaction log for
replication. So I figured I'd check to see how many tuples were in the
log.
pl=# select reltuples::int from pg_class where relname='sl_log_1';
reltuples
-----------
16945348
(1 row)
pl=# select reltuples::int from pg_class where relname='sl_log_2';
reltuples
-----------
0
To confirm that replication is falling behind I checked one of the
replicated tables on the slaves to see if it was in sync with the master
Master: pl=# select activity_id from pl02_activity_table order by
activity_id desc limit 1;
activity_id
-------------
13835041
(1 row)
Slave1: pl=# select activity_id from pl02_activity_table order by
activity_id desc limit 1;
activity_id
-------------
13818012
(1 row)
Slave2: pl=# select activity_id from pl02_activity_table order by
activity_id desc limit 1;
activity_id
-------------
13818008
(1 row)
Software versions:
Slony-1.05
postgres 7.4.6
MasterDB: 3.8 Ghz Pentium 4
6GB of RAM
5 36GB 10K SCSI in HW RAID5
RedHat 9
Slave1 & Slave2: Dual Opteron 246
8 GB Ram
6 73GB 15k FC drives in HW RAID 1+0 (512 MB battery-backed cache on
controller)
SLES 8 for AMD64
DB-size on Master: 78 GB
"" Slave1: 74 GB
"" Slave2: 69 GB
Are things removed from the log once they're replicated or do they stay
in there for a while?
Am I correct in guessing that for some reason its attempting to do a seq
scan on the sl_log_1 table to look for new rows to update, and thats why
those processes are taking forever?
Is this expected? or is it possible that I goofed something during the
install?
If more information is needed let me know and i'll do my best to provide
it.
-Joe Markwardt
- Previous message: [Slony1-general] [PATCH] Add --config and --help to more tools/altperl scripts
- Next message: [Slony1-general] Slony performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list