Tue Feb 23 00:55:51 PST 2010
- Previous message: [Slony1-hackers] Cleaning the sl_confirm table
- Next message: [Slony1-hackers] Cleaning the sl_confirm table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm sorry for the spam, but it seems you ML manager does'nt like lines
starting with "From ", it cuts the mail body just before it.
Here(s my full mail this time:
Hi,
I have a cool Slony cluster with 12 PGs running for years.
But I noticed some strange data in the origin's sl_confirm table:
SELECT *
FROM _ob2replication.sl_confirm
WHERE con_seqno NOT IN (SELECT ev_seqno FROM _ob2replication.sl_event);
con_origin | con_received | con_seqno | con_timestamp
------------+--------------+-----------+----------------------------
21 | 26 | 3277845 | 2009-09-26 15:01:12.61598
21 | 27 | 3277845 | 2009-10-22 15:13:26.370957
21 | 25 | 3277845 | 2009-09-26 14:54:16.162632
21 | 5 | 3277845 | 2009-10-22 15:13:26.669225
con_timestamp should be some date near today (february 2010)
Or from another server:
con_origin | con_received | con_seqno | con_timestamp
------------+--------------+-----------+----------------------------
21 | 26 | 3277845 | 2009-09-26 15:01:12.61598
21 | 27 | 3277845 | 2009-09-26 15:01:12.549303
21 | 22 | 3277845 | 2009-09-26 15:01:12.627592
21 | 5 | 3277845 | 2009-09-26 15:01:12.831765
21 | 25 | 3277845 | 2009-09-26 14:54:16.162632
And from another:
con_origin | con_received | con_seqno | con_timestamp
------------+--------------+-----------+----------------------------
21 | 26 | 3277845 | 2009-09-26 15:01:12.61598
21 | 5 | 3277845 | 2009-09-26 15:01:12.831765
21 | 27 | 3277845 | 2009-09-26 15:01:12.549303
21 | 25 | 3277845 | 2009-09-26 14:54:16.162632
21 | 22 | 3277845 | 2009-09-26 15:01:12.627592
17 | 25 | 0 | 2010-01-26 11:18:22.45564
17 | 26 | 0 | 2010-01-26 11:18:22.455672
17 | 27 | 0 | 2010-01-26 11:18:22.455704
17 | 22 | 0 | 2010-01-26 11:18:22.455729
17 | 5 | 0 | 2010-01-26 11:18:22.455753
17 | 21 | 0 | 2010-01-26 11:18:22.455794
17 | 12 | 0 | 2010-01-26 11:18:22.455835
17 | 16 | 0 | 2010-01-26 11:18:22.455861
17 | 2 | 0 | 2010-01-26 11:18:22.455898
17 | 24 | 0 | 2010-01-26 11:18:22.455926
On every server I have entries in the sl_confirm tables with events
references which do not exist anymore.
The cluster works, but I don't like thoses entries staying there. What
can I do ? Delete them ?
Those entries should be cleaned up but the cleanupEvent() plpgsql
function, but they are not.
I launched a debugged version of the cleanupEvent() function and here
what I got:
SELECT _ob2replication.mycleanup();
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=2
AND con_seqno < 1060869
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=5
AND con_seqno < 1060869
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=12 AND con_seqno < 1060869
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=16 AND con_seqno < 1060869
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=17 AND con_seqno < 1060869
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=22 AND con_seqno < 3277845
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=23 AND con_seqno < 1060928
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=24 AND con_seqno < 1060869
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=25 AND con_seqno < 3277845
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=26 AND con_seqno < 3277845
NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND
con_received=27 AND con_seqno < 3277845
F*r*o*m origin 21, the legitimate seqno is ~1060869. But the bad entries
in sl_confirm with a seqno ~3277845 are preventing cleanupEvent() to
clean those.
And if I take a look at the seqno range on the master I get:
SELECT con_origin,
MIN(con_seqno) AS first_con_seqno,
MAX(con_seqno) AS last_con_seqno,
MAX(con_seqno) - MIN(con_seqno) AS delta
FROM _ob2replication.sl_confirm
GROUP BY con_origin
ORDER BY con_origin ASC;
con_origin | first_con_seqno | last_con_seqno | delta
------------+-----------------+----------------+---------
2 | 1850804 | 1850903 | 99
5 | 1437305 | 1437403 | 98
12 | 992789 | 992888 | 99
16 | 957046 | 957145 | 99
17 | 230040 | 230139 | 99
21 | 1060965 | 3277845 | 2216880
22 | 4790482 | 4790906 | 424
23 | 1050820 | 1050919 | 99
24 | 99661 | 99694 | 33
25 | 1858636 | 1858734 | 98
26 | 2629674 | 2629773 | 99
27 | 1788901 | 1789000 | 99
(12 rows)
As you can see, the node 21 is keeping a huge range of confirmations
while the usual is to keep less than a thousand.
My supposition: Last year in september, there was a replication
maintenance, it is possible we might have to reinstall slony on the
node 21. If this was the case, the seqno was reset. But not every
event on the cluster was cleaned, and some confirm lines in sl_confirm
stayed there.
As every one of those confirm lines in sl_confirm in the cluster refer
to unexistent sl_event, can I delete those sl_confirm lines without fearing
any replication problem ?
Thanks again for the help =)
--
Laurent Raufaste
<http://www.glop.org/>
- Previous message: [Slony1-hackers] Cleaning the sl_confirm table
- Next message: [Slony1-hackers] Cleaning the sl_confirm table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list