Christopher Browne cbbrowne at ca.afilias.info
Wed Feb 6 10:34:43 PST 2008
Martin Fandel <martin.fandel at payzone.de> writes:
> i've installed a Slony-Replication with 2 nodes. Last year, the first
> node was the primary and the second node the secondary. This year,
> I have switchover the replication. So right now the first node is 
> secondary and the second node primary.
>
> Since yesterday, the replication was working fine. But right now all
> tables are locked while slon runs a vacuum process and i have seen 
> about 500 client processes which are hanging. A workaround for me was 
> killing the slony-vacuum process periodically. After killing it, the 
> Replication works fine and all clients are stop hanging.
>
> Here are my enviroment-informations:
>
> - Slony 1.2.9
>
> - Primary (Node 2) 
>   - 4 CPU Xeon 3,6 Ghz
>   - 4GB RAM
>   - 15k disks RAID10 
>   - SLES10 
>   - PostgreSQL 8.2.5 (compiled from source)
>
> - Secondary (Node 1)
>   - 2 CPU Xeon 1,8 Ghz
>   - 4GB RAM 
>   - 10k disks RAID10
>   - OpenSuse 10.2
>   - PostgreSQL 8.1.5 (suse rpm)
>
> I've tried this for getting the slony tables disabled for pg_autovacuum:
>
>  insert into pg_catalog.pg_autovacuum (
> 	vacrelid, 
> 	enabled,
> 	vac_base_thresh, 
> 	vac_scale_factor,
> 	anl_base_thresh, 
> 	anl_scale_factor,
> 	vac_cost_delay, 
> 	vac_cost_limit,
> 	freeze_min_age,
> 	freeze_max_age
>  ) select 
> 	oid, 
> 	'f', 
> 	'5000', 
> 	'0.2', 
> 	'10000', 
> 	'0.2', 
> 	'-1', 
> 	'-1', 
> 	'0', 
> 	'0' 
>  from 
> 	pg_catalog.pg_class 
>  where 
> 	relnamespace = (select oid from pg_namespace where nspname = '_' ||
> 'isohost_cluster') and relhasindex;
>         
> Thanks!

This seems mighty odd.

The internal cleanup thread will, when vacuuming, take out *some*
locks on the tables that it is vacuuming, but really only a
ShareUpdateExclusiveLock.  That shouldn't stall your application
connections.

The following query, if run at the time of this phenomenon, will show what table(s) are blocked.

test@[local]:5432=# select c.relname, l.mode from pg_locks l, pg_class c where not granted and c.oid = l.relation;
 relname | mode 
---------+------
(0 rows)

The following query will show additional information about what
connections are holding onto the locks that are evidently preventing
the connections above from accessing their tables:

test@[local]:5432=# select c.relname, l.* from pg_locks l, pg_class c where l.granted and c.oid in (select c1.oid from pg_locks l1, pg_class c1 where not l1.granted and c1.oid = l.relation);
 relname | locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid | mode | granted 
---------+----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+-----+------+---------
(0 rows)

These two queries should give a much better idea of *exactly* what is
being locked.  We can't proceed further without knowing that.
-- 
let name="cbbrowne" and tld="ca.afilias.info" in name ^ "@" ^ tld;;
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)


More information about the Slony1-bugs mailing list