[Slony1-general] Help - replication has broken, best way to restart it without destroying data.

Wed Mar 6 12:24:58 PST 2013

Hi Jan

Sorry for not including version, yup im 2.0.7 and pg 8.4.11.

That's good, at least I know no data has been corrupted.

As you say, I think DDL was run against the Slave by itself and i think it is corrupted. I don't think i can repair it because things were done against the Slave since then.

I am presuming the UNINSTALL NODE will bypass the slony event queue (which is blocked up with an error) ?

Thanks very much for your advice :)

Jon
________________________________________
From: Jan Wieck [JanWieck at Yahoo.com]
Sent: 07 March 2013 00:09
To: Jonathan Soong
Cc: slony1-general at lists.slony.info
Subject: Re: [Slony1-general] Help - replication has broken, best way to restart it without destroying data.

On 3/6/2013 7:08 AM, Jonathan Soong wrote:
> Hi guys
>
> Replication has died on my master-slave.
>
> What i would really like to do is stop replication and then get it to clear out the replicated tables in the Slave and start again (i.e truncate the replicated tables ONLY ).
>
> The problem I have is that only SOME tables are replicated between the Master and the Slave.
>
> - i do NOT want the replication to destroy the un-replicated tables on the Slave.
>
> I have seen all the instructions on how to start replication but most of them have the step of drop the Slave db and re-import the schema from the Master. I can't do this as I don't want to clobber my un-replicated tables.
>
> Is it possible to do what I want?

Certainly.

>
> My initial thoughts were to do an slonik "unsubscribe set" and then do a slonik "subscribe set"
>
> The problem I have is that the slon daemon on the Slave is erroring (see below), and from what I understand the "unsubscribe set" event will just be put in the event queue. I.e. The Slave will never be able to unsubscribe because it is erroring on an earlier event.
>
> I was thinking there might be a way to delete all events in the queue so it could just continue ... is this possible? I realise that after that the data is not in sync, but im planning on blowing away the replicated tables and starting again anyway...
>
> Although there might be a way to fix replication, i kind of want to know how to just do a "clean replication" from this point, as I see it as an important task.
>
> Thanks for any thoughts/opinions.
>
> Cheers
>
> Jon
>
> ----------------------------------------------------------------
>
>
>
> Error in Slave log is:
> 2013-03-06 19:56:41 HKTINFO   remoteWorkerThread_1: syncing set 1 with 47 table(s) from provider 1
> 2013-03-06 19:56:42 HKTERROR  remoteWorkerThread_1: "select "_replication".sequenceSetValue(10,1,'5002597503','291048'); " PGRES_FATAL_ERROR ERROR:  Slony-I: sequenceSetValue(): sequence 10 not found

You did not tell us which version of Slony you are running. That is
always good to include in a problem report. In this case it is
especially relevant since the above is probably the result from doing
some DDL on the replica (slave) directly, without sending it through
slonik's EXECUTE SCRIPT functionality. My bet is that someone dropped
and recreated the sequence in question. You can find out what sequence
that is by doing

     SELECT * FROM "_replication".sl_sequence where seq_id = 10;

If you are still running Slony 1.2 then it is important to know what
other DDL may have been executed on the replica since doing that in 1.2
can lead to a permanently corrupted system catalog. There is no danger
of that in 2.0 or 2.1.

Operations harmful on replicated tables on a replica under 1.2 include
things like create/drop index, add/drop constraint, create/drop trigger
as well as dropping columns.

If you conclude that your system catalog must still be consistent
because you are running 2.0 or higher or none of the above has happened,
then you can try to fix the problem first by running the slonik command
REPAIR CONFIG against the replica node.

     http://slony.info/documentation/2.1/stmtrepairconfig.html

If that does not fix the problem, then you can uninstall Slony using the
UNINSTALL NODE command for all nodes.

     http://slony.info/documentation/1.2/stmtuninstallnode.html

This does not touch any data on any of the nodes. It only restores
things so that both systems at that point are stand alone databases.

After that you simply rebuild your replication cluster like it was done
originally. Creating nodes, paths, sets, tables and subscribing them.

You do not need to truncate anything on the replica. Slony will take
care of that.

If you have to go through that it would be a good opportunity to upgrade
to Slony 2.1 while at it.

Regards,
Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin