Fri Aug 15 14:34:30 PDT 2008
- Previous message: [Slony1-general] why my slony became invalidation?
- Next message: [Slony1-general] RE: duplicate key value violates unique constraint "sl_nodelock-pkey"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello list, I've been using Slony in production for months and things have worked out pretty well so far. But this past week I ran into problems when we switched from Postgres 8.2 to Postgres 8.3 (still using Slony version 1.2.13). I don't think the error results from the version upgrade, but I could be wrong. I set up the master database (query-1) and then proceeded to set up replication to two slaves (query-2 and query-4). I kicked off the slon processes for all three simultaneously. Replication to query-4 got started right away and succeeded - it took 2 hours for the initial bulk copy of data, and soon after that it was caught up with incremental changes. However, replication to query-2 died right away. The slon process for query-2 quit immediately and I saw the following message in the log: 2008-08-15 13:57:01 PDT FATAL localListenThread: "select "_stingray_cluster".cleanupNodelock(); insert into "_stingray_cluster".sl_nodelock values ( 2, 0, "pg_catalog".pg_backend_pid()); " - ERROR: duplicate key value violates unique constraint "sl_nodelock-pkey" Now, I searched the interweb for solutions to this problem and I found several suggestions, but none of them have helped so far... 1. People suggest that this happens when you try to start multiple slon processes for the same node at the same time. I'm quite certain that this is not the case. I've checked and double-checked and quadruple-checked. For what it's worth, remember that I've successfully run Slony replication to multiple slaves for months. Here are the ones that are running: 25537 pts/3 S 0:00 slon -d 2 stingray_cluster host=query-1.colo.redfin.com dbname=my_db user=my_user password=my_password 25541 pts/3 Sl 0:00 \_ slon -d 2 stingray_cluster host=query-1.colo.redfin.com dbname= my_db user= my_user password= my_password 25544 pts/3 S 0:00 slon -d 2 stingray_cluster host=query-4.colo.redfin.com dbname= my_db user= my_user password= my_password 25545 pts/3 Sl 0:00 \_ slon -d 2 stingray_cluster host=query-4.colo.redfin.com dbname= my_db user= my_user password= my_password And the one that starts and then immediately dies has query-2 as the host. 2. Some people suggest that this happens when a slon process dies and leaves a stale connection, a connection so deep in the network stack that slony and postgres are not aware. For this people suggest either waiting about two hours for some timeout to kick in, or else having the admin kill -2 the zombie process/connection. Apparently this happens more often if you try to run slony across a wan. Well, our slon processes all run on the master node and the other nodes are right there in the same LAN. (we did once try to run Slony across a WAN but that never worked) This duplicate key issue first happened more than 12 hours ago, and it continues to happen. Each time I try to start a slon process for query-2, it dies right away with the same error. So, I don't think there's a zombie connection, but I also don't know exactly how to check for one. I can say that all the proc's in the pg_stat_activity have a xact_start that is less than an hour old. 3. One person mentioned having to go into the sl_nodelock table and manually whack the bad row. I'm a little concerned about doing this kind of surgery because I don't know the sl_nodelock table and therefore I don't know what would be the consequence of whacking a row (especially if I accidentally whack the wrong row). Here's what I see in that table: stingray_prod=# select * from _stingray_cluster.sl_nodelock; nl_nodeid | nl_conncnt | nl_backendpid -----------+------------+--------------- 2 | 11 | 14774 2 | 15 | 20569 1 | 0 | 25551 4 | 16 | 25568 4 | 17 | 25570 (5 rows) What do these columns mean exactly? Is there documentation on this table somewhere? Which one is likely to be the offending row? Is it crazy to whack the row manually? Would I need to also whack corresponding rows in other tables? Any help with this would be much appreciated. On a related note - is there a psql command (like \d) that I can use to list all the objects in my _stingray_cluster schema? Thanks, --S -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20080815/0b06263b/attachment.htm
- Previous message: [Slony1-general] why my slony became invalidation?
- Next message: [Slony1-general] RE: duplicate key value violates unique constraint "sl_nodelock-pkey"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list