Mon Aug 24 07:27:39 PDT 2009
- Previous message: [Slony1-general] Ugh - New Issue with using Slony to upgrade Postgresql
- Next message: [Slony1-general] How To Install on w2k3 servers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Karl Denninger wrote: > > > Jeff Frost wrote: >> Karl Denninger wrote: >>>> = >>> But they should have switched the master to Node #4 when the move >>> set command was executed. When they reconnect they should be doing >>> so to Node #2, not Node #2 - IF they saw the "move set" command (and >>> it appears they did.) >>> >>> Further, I ran the change in the paths on that node - that is, >>> locally to that machine. No difference. >> When you indicate that you ran the store path on that node, can you >> be specific about what you did? >> >> >>>>> I'm wondering what happened here. It is almost as if the "move set" >>>>> never executed on the other subscribers - an impossibility, no? They >>>>> WERE all replicating and current just before the shutdown - I checked >>>>> them all. How does that happen under these circumstances? >>>>> >>>>> Is there a better way for the future? I'm back up now, but the entire >>>>> point of this exercise was to AVOID having to copy the entire database >>>>> over - while I avoided any material downtime for my users, I was left >>>>> EXPOSED to a failure for the copy period, which was kinda nasty. >>>>> >>>>> Thoughts appreciated. >>>>> >>>>> = >>>> >>>> Probably the way to avoid it would have been to issue the store path >>>> changes before switching the ports. But, if you forget to do it in the >>>> future, you can fix it afterwards by going bare metal and updating the >>>> paths in the _tickerform.sl_path table on the nodes that don't have the >>>> correct information. >>>> >>>> = >>> I still don't understand why the node change wasn't picked up by >>> these slaves when the move set executed; I would have expected that >>> this would be the case (that is, it would be expecting Node #4 to be >>> the master) and although it showed up on the "wrong" ip address a >>> store path should have fixed that. >>> >>> It APPEARS that it was looking for the old master on Node #2.... >>> implying (I think) that it never saw the move set. >>> >>> Or am I misunderstanding how the internals work here? >>> >> >> I don't think the problem is that it didn't see the move set, I think >> the problem is that it didn't get the store path commands because it >> didn't connect to the 'new' master after you changed the ports out >> from under it. I don't think slony is well designed for having the >> paths changed out from under it and you'll likely have to fix them by >> hand when you do this. >> >> I'm pretty sure what happened (and hopefully someone will correct me >> if I'm wrong) is even though you ran the slonik store path command on >> the broken node, slonik connected to the new master, updated the >> master's DB with the store path info and put this event in the log to >> propagate out to the slaves. Unfortunately, because the broken slave >> still had the old path in the sl_path table, it didn't know how to >> connect to the new master and therefore never received the new path >> information. = >> > But the log says it DID receive the new path information - when I > executed the "store paths" on the client the log file for slon on that > client immediately reflected that the path configuration had been > changed. So clearly, it saw it on the local host. > Do you still have the logs sitting around? Can you post them? > I have since dropped the old database (which was running as a "safety" > overnight using "drop node" and of note that DID drop the schema as > the replication was torn down...... > Ah! You're right! From the docs: " If the replication daemon is still running on that node (and processing events), it will attempt to uninstall the replication system and terminate itself." Interestingly, I've never seen it do that. I actually went and checked in the docs all the way back to 1.2.10 and it says the same thing. I suppose I've almost always dropped a broken node, so it probably wouldn't have processed the event. = > Its pretty clear to me that something went wrong during the move set - > but exactly what and why I can't reproduce at the present time. > > I'll have to see if I can set up a "sandbox" and try this in an > isolated environment to see if I can figure out why it happened and > hopefully prevent myself from getting bit like this again. > > Always an excellent idea. Report back with your results and how to reproduce any strange behavior you find. -- = Jeff Frost <jeff at pgexperts.com> COO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 http://www.pgexperts.com/ = -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20090824/= 11f0bc37/attachment.htm
- Previous message: [Slony1-general] Ugh - New Issue with using Slony to upgrade Postgresql
- Next message: [Slony1-general] How To Install on w2k3 servers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list