[Slony1-general] "Blueprints for High Availability"

Mon Jan 23 09:59:55 PST 2006

On Fri, Jan 20, 2006 at 05:26:56PM -0600, Jim C. Nasby wrote:
> I'm not well versed enough in this stuff to know. But being able to show
> folks how they could setup HA that was guaranteed not to lose committed
> data... that would be a huge boost for the community. I'm pretty sure

There's no system commercial available, AFAIK, that can offer that
guarantee.  Here's what it would have to guarantee to make it
possible, in the usual N (so this is N+1) notation

1.	The complete destruction of N machines cannot cause the loss
of any committed data.

2.	The complete destruction of N's data centre cannot cause the
loss of any committed data.

3.	The malicious compromise of N cannot cause the loss of any
committed data.

4.	The accidental compromise of N cannot cause the loss of any
committed data.

5.	Undetected bugs in code cannot cause the loss of any
committed data.

(2) is effectvely impossible, because even light has latency.  For
most transactions, users will not wait for the latency of wide-area
COMMIT messages.  (Banking isn't even an exception any more: online
trading systems would be incapable of the speeds they achieve -- and
the resulting occasional meltdowns they create -- if they had to do
2PC across the country, which is to say across power and
network-provider points.)  And there's simply nothing you can do to
guarantee that 3-5 is impossible.

This is all about risk management.  What you need to do is evaluate
how much your data is worth in the aggregate, how much any particular
transaction may possibly be worth, and then make sure that you don't
spend any more than that for the provision of the data.  If you do
spend more, you're going to be bankrupt; the question isn't whether,
it's just how long it will take.  Companies will happily _tell_ you
that their system offers these "guarantees", but it turns out that
when you do the real analysis, there simply isn't a way to provide
guarantees in the way people usually mean the word.  What you get is
assurance and a greater or smaller assurance level.  

Even people who claim to provide "five nines" usually can't really. 
That's because, for the small probablity that the 99.99% uptime
happens when 99.999% doesn't, it's likely to be cheaper to pay the
uptime penalty than it is to provision for the extra "9".  The same
thing is true with these "gurantees".

> Yeah, it would be damn nice if there was a stronger alternative. From
> what I've read I think Slony-II might fit the bill (though I can't
> remember if there's a guarantee that a changeset will exist at least
> somewhere else before COMMIT returns), but I suspect it wouldn't perform
> well over a WAN.

Well, the idea of slony-2 is that when a COMMIT returns, you are
guartanteed that all then-participating nodes have the data.  The big
question is whether slony-2 is even possible, alas.  and no, it
certainly won't work over a WAN.

a

-- 
Andrew Sullivan  | ajs at crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
		--J.D. Baldwin