[Slony1-general] slony 1.2.x to 2.y upgrade status?

Tue Feb 17 08:32:57 PST 2009

Mark Hagger <mark.hagger at m-spatial.com> writes:
> Has anyone had any more thoughts on being able to upgrade from 1.2.x
> to 2.0, without having to drop all replication and recreate it?  I'm
> sort of reluctant to do a drop/recreate because we have a large
> number of databases being replicated and it sounds a little tedious,
> and I assume will mean it'll have to re-sync from scratch.

I had a chat with Jan and others recently about this, and arrived at a
strategy that I intend to refine this week.

The "grand challenge" in said upgrade is that the format of the
transaction identification information has changed, so that
sl_log_[12], sl_confirm, sl_setsync, and sl_event all change in
format.

Trying to convert the values in these tables would actually be
counterproductive, as it would mean that, rather than 2.0 eliminating
the need for the "xxid" type and functions of 1.2, it would *increase*
the set, as we'd need the existing type, as well as C functions to
translate xxid values and combinations into txid_snapshot values.

Therefore, trying a *direct* conversion seems like a terrible idea.

Instead, what I propose doing is writing a script (or perhaps scripts)
that, in rough terms, does the following:

 1.  Locks all replication sets (akin to how MOVE SET works), so we
     can make certain that replication stops for a bit.

 2.  Waits for that to propagate everywhere, therefore establishing
     that all nodes are up to date.

 3.  Tells the administrator, "go ahead, install upgraded Slony-I."

 4.  Then, we go to each node, in turn, and, within a transaction, do
     the following:

     - First, load a function that does the work that follows,
       transforming from 1.2.16-ish to a "pre-2.0" state, *for the
       tables.* It does *not* load new functions; that's a subsequent
       step, #6.

     - It redoes the stored triggers on the tables, dropping the old
       ones, cleaning up FK triggers, and such.  (There's more detail
       to fill in here - nothing frightening, I don't think, just more
       detail!)

     - It runs TRUNCATE against the 4 tables mentioned earlier.

     - It inserts an sl_setsync entry, just as happens in copy_set()
       in src/slon/remote_worker.c, to indicate, for each replication
       set, that it is freshly copied on each subscriber node.

     - There will probably be a custom UPGRADE function; it can be
       dropped out at this point as it is never needed again.

     - Finally, we set this up to be a prepared truncation...
        PREPARE TRANSACTION "Slony-I 1.2 to 2.0 upgrade - @CLUSTER@";

     Note that at this point, that node is Pretty Locked Down.  This
     transaction has acquired locks on ALL tables involved in
     replication, including the application tables.

 5. If step #4 works successfully on all nodes, then we know we have a
    successful upgrade, and can safely go to each node and run COMMIT
    PREPARED on that transaction on each node.

    If any of them fail, then we should prefer to roll back all of the
    prepared transactions, undoing all of the work of step #4, go fix
    whatever was broken, and retry.

 6. Now, we need to update the functions.

    A slonik script runs UPDATE FUNCTIONS against all nodes.

At the "grand steps" level, that seems like it covers what needs to be
done.  I'll be going over the details in more detail this week; I
would welcome comments on anything I may have missed.
-- 
"cbbrowne","@","cbbrowne.com"
http://cbbrowne.com/info/wp.html
"...Yet terrible as Unix addiction  is, there are worse fates. If Unix
is the heroin of operating systems, then VMS is barbiturate addiction,
the Mac is MDMA, and MS-DOS is sniffing glue. (Windows is filling your
sinuses  with  lucite and  letting  it set.)   You  owe  the Oracle  a
twelve-step program."  --The Usenet Oracle