Fri Mar 4 19:19:31 PST 2005
- Previous message: [Slony1-general] Slony load on master db
- Next message: [Slony1-general] Slony load on master db
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dr.Christian Storm wrote:
>
> On Mar 4, 2005, at 10:58 AM, Darcy Buskermolen wrote:
>
>> On Friday 04 March 2005 09:38, Dr.Christian Storm wrote:
>>
>>> I walked in this morning and looked at the load on our master db and
>>> saw that between 4:30-6PM last night the average load went steadily up
>>> from 0.5 to 2.5. Using top I see that the two FETCH queries from Slony
>>> are swallowing two of our four CPUs:
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 19029 postgres 16 0 671m 600m 668m R 73.0 7.9 788:58.38 postgres:
>>> postgres tii 192.168.1.42 FETCH
>>> 19206 postgres 16 0 671m 600m 668m R 72.7 7.9 788:46.65 postgres:
>>> postgres tii 192.168.1.43 FETCH
>>>
>>> The %CPU vacillates between 70-90%.
>>>
>>> A further inspection of the stat activity confirms this:
>>>
>>> tii=# select procpid, query_start, age(now(),query_start),
>>> current_query from pg_stat_activity where current_query not like
>>> '%IDLE%' order by query_start;
>>> procpid | query_start | age |
>>> current_query
>>> ---------+-------------------------------+-----------------
>>> +----------------------
>>> 19029 | 2005-03-04 09:29:44.71891-08 | 00:00:06.708432 | fetch 100
>>> from LOG;
>>> 19206 | 2005-03-04 09:29:47.768228-08 | 00:00:03.659114 | fetch 100
>>> from LOG;
>>>
>>> The age ranges anywhere from 0 to 9 seconds.
>>>
>>> The load on the two slaves hasn't changed.
>>
>>
>> What's the row count in _clustername.sl_log1 and _clustername.sl_event
>
>
> tii=# select count(*) from sl_log_1;
> count
> --------
> 375344
> (1 row)
>
> tii=# select count(*) from sl_event;
> count
> -------
> 746
> (1 row)
That is consistent with data not getting cleaned out.
Try this query:
select con_origin, con_received, min(con_seqno) as minseq,
max(con_seqno) as maxseq, date_trunc('minutes',
min(now()-con_timestamp)) as age1,
date_trunc('minutes', max(now()-con_timestamp)) as age2
from sl_confirm
group by con_origin, con_received
order by con_origin, con_received;
On one of our systems, this generates the following:
Summary of sl_confirm aging
Origin Receiver Min SYNC Max SYNC Age of latest SYNC Age of eldest SYNC
==================================================================
1 5 7934376 7934861 00:00:00 00:13:00
1 36 7934375 7934861 00:00:00 00:13:00
1 78 7934384 7934870 00:00:00 00:13:00
1 501 7934382 7934870 00:00:00 00:13:00
5 1 131472 131478 00:01:00 00:13:00
5 36 131472 131478 00:01:00 00:13:00
5 78 131472 131478 00:01:00 00:13:00
5 501 131472 131478 00:01:00 00:13:00
36 1 15230 15237 00:00:00 00:14:00
36 5 15230 15237 00:00:00 00:14:00
36 78 15230 15237 00:00:00 00:14:00
36 501 15230 15237 00:00:00 00:14:00
78 1 75423 75430 00:00:00 00:14:00
78 5 75423 75430 00:00:00 00:14:00
78 36 75423 75430 00:00:00 00:14:00
78 501 75423 75430 00:00:00 00:14:00
501 1 117875 117876 00:04:00 00:14:00
501 5 117875 117876 00:04:00 00:14:00
501 36 117875 117876 00:04:00 00:14:00
501 78 117875 117876 00:04:00 00:14:00
I expect you'll find that there's a node that is a day or two behind in
confirming SYNCs.
There is a script in CVS called "test_slony_state.pl" which rummages
around to find various information such as the above that is quite
useful in detecting some misconfigurations.
I'd almost bet lunch that one of the subscriber nodes isn't confirming
SYNCs against the others, and that this is the result of there being no
suitable sl_listen path between the subscribers...
- Previous message: [Slony1-general] Slony load on master db
- Next message: [Slony1-general] Slony load on master db
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list