Karl Denninger karl at denninger.net
Sat Jul 31 21:19:31 PDT 2010
I upgraded somewhat-recently from 2.0.2 to 2.0.4, and now I've got a
serious problem.

The reason for the "gotta do it now" was that somehow one of the tables
got out of sync, and a delete was failing to propagate - hanging the
process.

OK, ok, so 2.0.2 with Postgres 8.4.4 is a bit old and mismatched.  So I
upgraded to 2.0.4 on all the nodes, and told the subscriber to reload -
ditched the client config and re-subscribed the sets.

All went well until a very large table came up - it failed.

There's no error in the logs indicating why, other than the following:

Jul 31 22:52:53 dbms TICKER[70295]: [153-1] CONFIG remoteWorkerThread_3:
copy table "public"."images"
Jul 31 22:52:53 dbms TICKER[70295]: [154-1] CONFIG remoteWorkerThread_3:
Begin COPY of table "public"."images"
Jul 31 22:54:24 dbms TICKER[70295]: [155-1] ERROR  remoteWorkerThread_3:
PGgetCopyData() server closed the connection unexpectedly
Jul 31 22:54:24 dbms TICKER[70295]: [155-2]     This probably means the
server terminated abnormally
Jul 31 22:54:24 dbms TICKER[70295]: [155-3]     before or while
processing the request.
Jul 31 22:54:24 dbms TICKER[70295]: [156-1] WARN   remoteWorkerThread_3:
data copy for set 1 failed 1 times - sleep 15 seconds

And in 15 seconds, the entire process of trying to re-init the node
starts over - from the beginning!

Near as I can tell, it's failing pretty early on.

The source host is fine.  This particular table contains a BYTEA field,
and it's BIG.  ~20ish gigs big.  But I've re-initialized in the past
without problems.  I tried going back to 2.0.2, and that still fails. 
Both servers are running with encoding set to SQL_ASCII, if it matters.

When it fails the SERVER's COPY is still running - so the client is
definitely wrong on the reported error.  I have NOTHING in the server's
SLON log and there are no comms problems between the two hosts.

I'm going to run a dump of the table and see if I can manually bring it
over to the other host and load it.  There's nothing going on with the
master that implicates the data being damaged......

Ideas?

-- Karl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: karl.vcf
Type: text/x-vcard
Size: 124 bytes
Desc: not available
Url : http://lists.slony.info/pipermail/slony1-general/attachments/20100731/719c0571/attachment.vcf 


More information about the Slony1-general mailing list