Sun Jan 22 02:23:18 PST 2012
- Previous message: [Slony1-general] [bug] config variable "quit_sync_finalsync" missing in slony 2.1.0 ?
- Next message: [Slony1-general] Slave can't catch up, postgres error 'stack depth limit exceeded'
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all, PostgreSQL 9.1.2 Slony 2.1.0 I am having some trouble getting a slon node caught up on events. It's a larger database, 350 or so Gigs, and I added a node to a replication set and while it was doing the initial sync, the server that the slon daemons were running on died. It wasn't until about 5 hours later we got the daemons running on a different node and it restarted (i assume it restarted) the initial sync. From what I can tell, it finished the initial sync, however now it's unable to catch up due to the following error line (reduced in size, don't know how many elements there actually were but the single line had about 18 million characters): 2012-01-22 04:43:07 EST ERROR remoteWorkerThread_1: "declare LOG cursor for select log_origin, log_txid, log_tableid, log_actionseq, log_cmdtype, octet_length(log_cmddata), case when octet_length(log_cmddata) <= 1024 then log_cmddata else null end from "_myslonycluster".sl_log_1 where log_origin = 1 and log_tableid in (2,3,4,5,6,7,1,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122) and log_txid >= '34299501' and log_txid < '34311624' and "pg_catalog".txid_visible_in_snapshot(log_txid, '34311624:34311624:') and ( log_actionseq <> '2474682' and log_actionseq <> '2403310' and log_actionseq <> '2427861' and <SNIP, repeated many thousands of times with different numbers> ' and log_actionseq <> '2520797' and log_actionseq <> '2519348' and log_actionseq <> '2485828' and log_actionseq <> '2523367' and log_actionseq <> '2469096' and log_actionseq <> '2520589' and log_actionseq <> '2414071' and log_actionseq <> '2391417' ) order by log_actionseq" PGRES_FATAL_ERROR ERROR: stack depth limit exceeded I found someone with a similar(ish) issue back in the day, and a function called compress_actionseq was mentioned. I turned up debugging to level 4 and see that it is indeed compressing the actionseq, and I looked at the code and it also looks like the above output IS the compressed sequence. Now, this seems to be a tricky setting to tweak on postgres, so I'd rather not unless I had to. So my thoughts were to hopefully just force slony to try to do smaller syncs at a time. I tried reducing (and for the heck of it increasing) the group size, desired_sync_time, sync_max_rowsize, and sync_max_largemem. However nothing has altered the size of this query that is being executed on the database. Any thoughts, suggestions? The initial sync of slony takes about 14 hours, so I'd rather not drop the node and re-attach it. In fact I have two nodes in the same issue, stuck at the same event, so I'd rather just get them both synced up without doing another initial sync. Also, I toyed with the idea of forcing slon daemon to only sync up to a specific event, in hopes to do blocks of say 500 events, however the quit_sync_finalsync parameter is not accepted correctly by slony 2.1.0. (I've submitted a email to this list about this too). Thanks in advance, - Brian F
- Previous message: [Slony1-general] [bug] config variable "quit_sync_finalsync" missing in slony 2.1.0 ?
- Next message: [Slony1-general] Slave can't catch up, postgres error 'stack depth limit exceeded'
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list