[Slony1-general] Determine how many items in log belong to an event...?

Fri Aug 17 06:47:30 PDT 2007

Replies in-line.

On Wednesday 15 August 2007 16:33, David Rees wrote:
> On 8/15/07, Dan Falconer <lists_slony1-general at avsupport.com> wrote:
> > REQUIRED INFO:
> >         Slony-1, v1.2.10 (both servers)
> >         PostgreSQL v8.0.4 (both servers)
> >         OS: SLES 8.1
>
> Why are you running such an old version of Pg? You should be running
> 8.0.13 at least...

Good point... I'm not a DBA, but I play one at my job.  ;) 

>
> >         Our inventory processor regularly handles ~4 million
> > updates/inserts when it processes inventory files.  Generally, we can
> > only run this process once a week, due to the time it takes for the slave
> > server to catch up: it usually takes about 5 days.
>
> When it's catching up, what does the load on the servers look like? Is
> it CPU bound, IO bound, network bound, etc? Have a look at top and
> vmstat on each machine to help determine this (and post the info as
> well if you have any questions).

The master server has a very worrisome load, running between 5-20.  I took a 
snapshot of it:
	load average: 13.18, 7.23, 4.33
There's also a fetch that's been running for a very long time (there was a 
poorly timed vacuum of the whole database that took 2 days while this was 
running; this has been formatted for your screen)::::

USER=postgres
PR=25
NI=0
VIRT=109m
RES=109m
SHR=106m
S=R
%CPU=87.1
%MEM=1.5
TIME+=722:47.39
COMMAND=postgres: postgres pl ::ffff:192.168.1.200(43262) FETCH

The slave server, however, is running at very little load (around 1).

VMSTAT FOR MASTER SERVER (sorry for the poor formatting):::

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 4  1  62100  28880  94372 6953044    0    0     9    36    1     1 21 10 68  
0
 3  0  62100  31592  94384 6953124    0    0    48   100  271  1765 55  3 42  
0
 5  1  62100  28392  94516 6954308    0    0   844   992  722 11378 57 12 31  
0
 2  0  62100  29072  94624 6954984    0    0   712   528  326   724 50  1 49  
0
 3  1  62100  18376  94724 6951928    0    0   872   552  663  3291 56 10 34  
0
 1  2  62100  16584  94752 6942792    0    0  1324   584  651  2507 69  5 26  
0
 1  1  62100  26716  94800 6943372    0    0   612    64  344  1256 54  2 44  
0
 2  1  62100  25916  94828 6944460    0    0   876   488  416   990 54  0 45  
0
 1  1  62100  23980  94896 6945004    0    0   472   444  391 10439 52  7 40  
0
 1  0  62100  35436  94908 6945404    0    0   360   300  603  1897 53  6 41  
0
 2  0  62100  37792  94912 6945480    0    0    28   116  237   843 52  1 46  
0
 1  0  62100  37972  94916 6945480    0    0     0    44  157   529 50  0 50  
0
 1  1  62100  35512  94936 6945752    0    0   216   168  245   921 51  4 45  
0

>
> What settings are you running your slon daemons with? There are some
> tweaks to be made to sync_group_maxsize which may help. Some
> additional vacuuming of the sl_log tables may help, too. If you run
> through the archives, this type of issue has come up before as well.
>
> -Dave

The slons aren't run with any special settings.  They're started using 
the "slon_start" bash script included with Slony (some items removed to 
protect the innocent):::

/usr/local/pgsql/bin/slon -s 5000 -d2 replication host=******* dbname=pl 
user=postgres port=5432

In the meantime, I believe I'm going to stop the slon daemons, vacuum their 
tables, and then try restarting them.  Any help for optimization on the slons 
would be cool; I hadn't even thought about trying that before posting to the 
list.

-- 
Best Regards,

Dan Falconer
"Head Geek", Avsupport, Inc. / Partslogistics.com
http://www.partslogistics.com