Tue Feb 23 00:55:51 PST 2010
- Previous message: [Slony1-hackers] Cleaning the sl_confirm table
- Next message: [Slony1-hackers] Cleaning the sl_confirm table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm sorry for the spam, but it seems you ML manager does'nt like lines starting with "From ", it cuts the mail body just before it. Here(s my full mail this time: Hi, I have a cool Slony cluster with 12 PGs running for years. But I noticed some strange data in the origin's sl_confirm table: SELECT * FROM _ob2replication.sl_confirm WHERE con_seqno NOT IN (SELECT ev_seqno FROM _ob2replication.sl_event); con_origin | con_received | con_seqno | con_timestamp ------------+--------------+-----------+---------------------------- 21 | 26 | 3277845 | 2009-09-26 15:01:12.61598 21 | 27 | 3277845 | 2009-10-22 15:13:26.370957 21 | 25 | 3277845 | 2009-09-26 14:54:16.162632 21 | 5 | 3277845 | 2009-10-22 15:13:26.669225 con_timestamp should be some date near today (february 2010) Or from another server: con_origin | con_received | con_seqno | con_timestamp ------------+--------------+-----------+---------------------------- 21 | 26 | 3277845 | 2009-09-26 15:01:12.61598 21 | 27 | 3277845 | 2009-09-26 15:01:12.549303 21 | 22 | 3277845 | 2009-09-26 15:01:12.627592 21 | 5 | 3277845 | 2009-09-26 15:01:12.831765 21 | 25 | 3277845 | 2009-09-26 14:54:16.162632 And from another: con_origin | con_received | con_seqno | con_timestamp ------------+--------------+-----------+---------------------------- 21 | 26 | 3277845 | 2009-09-26 15:01:12.61598 21 | 5 | 3277845 | 2009-09-26 15:01:12.831765 21 | 27 | 3277845 | 2009-09-26 15:01:12.549303 21 | 25 | 3277845 | 2009-09-26 14:54:16.162632 21 | 22 | 3277845 | 2009-09-26 15:01:12.627592 17 | 25 | 0 | 2010-01-26 11:18:22.45564 17 | 26 | 0 | 2010-01-26 11:18:22.455672 17 | 27 | 0 | 2010-01-26 11:18:22.455704 17 | 22 | 0 | 2010-01-26 11:18:22.455729 17 | 5 | 0 | 2010-01-26 11:18:22.455753 17 | 21 | 0 | 2010-01-26 11:18:22.455794 17 | 12 | 0 | 2010-01-26 11:18:22.455835 17 | 16 | 0 | 2010-01-26 11:18:22.455861 17 | 2 | 0 | 2010-01-26 11:18:22.455898 17 | 24 | 0 | 2010-01-26 11:18:22.455926 On every server I have entries in the sl_confirm tables with events references which do not exist anymore. The cluster works, but I don't like thoses entries staying there. What can I do ? Delete them ? Those entries should be cleaned up but the cleanupEvent() plpgsql function, but they are not. I launched a debugged version of the cleanupEvent() function and here what I got: SELECT _ob2replication.mycleanup(); NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=2 AND con_seqno < 1060869 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=5 AND con_seqno < 1060869 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=12 AND con_seqno < 1060869 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=16 AND con_seqno < 1060869 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=17 AND con_seqno < 1060869 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=22 AND con_seqno < 3277845 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=23 AND con_seqno < 1060928 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=24 AND con_seqno < 1060869 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=25 AND con_seqno < 3277845 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=26 AND con_seqno < 3277845 NOTICE: DELETE FROM sl_confirm WHERE con_origin=21 AND con_received=27 AND con_seqno < 3277845 F*r*o*m origin 21, the legitimate seqno is ~1060869. But the bad entries in sl_confirm with a seqno ~3277845 are preventing cleanupEvent() to clean those. And if I take a look at the seqno range on the master I get: SELECT con_origin, MIN(con_seqno) AS first_con_seqno, MAX(con_seqno) AS last_con_seqno, MAX(con_seqno) - MIN(con_seqno) AS delta FROM _ob2replication.sl_confirm GROUP BY con_origin ORDER BY con_origin ASC; con_origin | first_con_seqno | last_con_seqno | delta ------------+-----------------+----------------+--------- 2 | 1850804 | 1850903 | 99 5 | 1437305 | 1437403 | 98 12 | 992789 | 992888 | 99 16 | 957046 | 957145 | 99 17 | 230040 | 230139 | 99 21 | 1060965 | 3277845 | 2216880 22 | 4790482 | 4790906 | 424 23 | 1050820 | 1050919 | 99 24 | 99661 | 99694 | 33 25 | 1858636 | 1858734 | 98 26 | 2629674 | 2629773 | 99 27 | 1788901 | 1789000 | 99 (12 rows) As you can see, the node 21 is keeping a huge range of confirmations while the usual is to keep less than a thousand. My supposition: Last year in september, there was a replication maintenance, it is possible we might have to reinstall slony on the node 21. If this was the case, the seqno was reset. But not every event on the cluster was cleaned, and some confirm lines in sl_confirm stayed there. As every one of those confirm lines in sl_confirm in the cluster refer to unexistent sl_event, can I delete those sl_confirm lines without fearing any replication problem ? Thanks again for the help =) -- Laurent Raufaste <http://www.glop.org/>
- Previous message: [Slony1-hackers] Cleaning the sl_confirm table
- Next message: [Slony1-hackers] Cleaning the sl_confirm table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list