[Slony1-general] data copy for set 1 failed 3 times

Fri Nov 30 06:39:34 PST 2012

On 12-11-29 11:23 PM, Tory M Blue wrote:
>
> Well this is frustrating. So I was successful in replicating a smaller
> data set without issue. Once I try to replicate large amounts of data ,
> it seems to fail and restart, at what it feels the end of the biggest
> table of each set.
>
> 2012-11-29 19:54:06 PST CONFIG remoteWorkerThread_1: 16341.883 seconds
> to copy table "tracking"."spotimpressions"
> 2012-11-29 19:54:06 PST CONFIG remoteWorkerThread_1: copy table
> "tracking"."impressions"
> 2012-11-29 19:54:06 PST CONFIG remoteWorkerThread_1: Begin COPY of table
> "tracking"."impressions"
> 2012-11-29 19:54:06 PST ERROR  remoteWorkerThread_1: "select
> "_admissioncls".copyFields(19);"
> 2012-11-29 19:54:06 PST WARN   remoteWorkerThread_1: data copy for set 2
> failed 1 times - sleep 15 seconds
>
> This large table ran for 4+ hours and the minute it starts with the very
> next table, it "fails", Identical behavior when doing set 1 which has a
> large table
>
>
> 1235574-2012-11-29 12:22:12 PST CONFIG remoteWorkerThread_1: Begin COPY
> of table "cls"."customers"
> 1235665-2012-11-29 12:22:12 PST ERROR  remoteWorkerThread_1: "select
> "_admissioncls".copyFields(8);"
> 1235759:2012-11-29 12:22:12 PST WARN   remoteWorkerThread_1: data copy
> for set 1 failed 1 times - sleep 15 seconds
> Followed sometime later by this
> 2012-11-29 12:22:28 PST DEBUG2 remoteWorkerThread_2: forward confirm
> 3,5001168772 received by 4
> 2012-11-29 12:22:28 PST INFO   copy_set 1 - omit=f - bool=0
> 2012-11-29 12:22:28 PST INFO   omit is FALSE
>
>
> So what's going on, it appears to have made it through the heavy
> lifting, but it immediately goes to fail as it starts a much smaller
> table. Why does it wait to make it through the largest table in the set,
> before it says "bahh just kidding".
>
> AHHH interesting, yet again at the moment of "fail" a log switchover is
> starting, this is identical to each and every failure. Why is a log
> switch appearing right before every failure?!
>
> Can I disable this for a test? , disable logswitch

You can disable/alter the log switch by making the cleanup interval in 
the slon on the master to be very big, longer than your 
tests/subscriptions take to run. (see cleanup_interval (interval) , on
http://www.slony.info/documentation/2.1/slon-config-interval.html)

This is for testing purposes I'n not recommending this as a solution. I 
also doubt this is the cause of your problem (but let us know if it does 
turn out to be that, because it means something is wrong, somewhere)

You never did send me the output of:
select "_admissioncls".copyFields(19);  or equivilent from your master. 
  You also never sent any information about the schema on the problem table.

You might want to turn query logging on for the origin/provider node (at 
least for the slony user).  This will tell us exactly what the SQL being 
executed is when the error occurs.

Possibilities include:
1)  copyFields() is still returning osmething bad, ie ')' for this 
table, so the SQL that later gets executed is
COPY ()) FROM "tracking"."impressions";

or some other bad SQL in the copy.

2) The connection is actually aborting during the copy for connection 
related reasons.   In the past people have reported issues where their 
firewall resets, connections after x minutes.  We've also in the past 
had issues with openssl where some limit was reached and the connection 
was killed due to an openssl issue.

3) Something else

>
> 2012-11-29 19:54:11 PST admissionclsdb postgres [local] NOTICE:
> Slony-I: Logswitch to sl_log_2 initiated
> 2012-11-29 19:54:11 PST admissionclsdb postgres [local] CONTEXT:  SQL
> statement "SELECT "_admissioncls".logswitch_start()"
>      PL/pgSQL function "cleanupevent" line 96 at PERFORM
>
> 2012-11-29 12:22:13 PST admissionclsdb postgres [local] NOTICE:
> Slony-I: Logswitch to sl_log_2 initiated
> 2012-11-29 12:22:13 PST admissionclsdb postgres [local] CONTEXT:  SQL
> statement "SELECT "_admissioncls".logswitch_start()"
>      PL/pgSQL function "cleanupevent" line 96 at PERFORM
>
>     "
>
> Thanks
> Tory
>
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>