Sun Apr 24 03:47:12 PDT 2005
- Previous message: [Slony1-general] move set
- Next message: [Slony1-general] Re: running slony in polling mode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>>> LOCK SET (ID = 1, ORIGIN = 1); >>> WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = 2); MOVE SET (ID = 1, OLD >>> ORIGIN = 1, NEW ORIGIN = 2); WAIT FOR EVENT (ORIGIN = 1, CONFIRMED = >>> 2); >>> >>> I assume I'm missing something in my syntax somewhere. >> >>I'm accustomed to watching logs, and not using WAIT FOR EVENT :-). > > Well, I based my syntax on the example on YOUR website! ;-) (which has > been a great help to me in my slony efforts, btw, so thank you!) > > I guess I wasn't understanding the whole forwarding concept - I had > associated it only with Publisher->Subscriber->Secondary-Subscriber kind > of scenario. > > When you say "watching logs", do you mean the sl_log_1 table, or slon > logs? No, that's the least useful thing, in a sense... I'm thinking instead of the files generated by the slon daemons. While I can't claim they're perfectly easy to interpret (there's a nonexistent section on that in the docs!), there is information to be found about what's going on, as events generally get recorded in the logs. Looking at sl_event would probably be a decent "second place" to go, though if I'm puzzling over what is happening, I usually have some xterms open to look at log files... >>If it's hanging, then some event is failing to propagate, and >>reviewing sl_event may give some ideas. > > I've come to the conclusion that it was actually the LOCK SET that was > hanging, due to some open transaction somewhere. I had not realized that > LOCK SET actually went and put a trigger on every table in the set - > shame on me for not reading the source code. For the use-case I need to > support, it looks like a FAILOVER is really the only option, after all, > because we can't make any assumptions about how busy the system might > be, and having the switch/failover take an indeterminate time isn't > going to be acceptable. Ah, that makes it make WAY more sense. It seemed odd to me, and I didn't recognize the case, because in all the scenarios we have here at Afilias to do this sort of thing, we schedule an outage, and so _don't_ have 25 users in the system blocking the process. Note that "shame on me for not reading the source code" probably means you ought to file a bug on the documentation or some such thing ;-). I have certainly expanded on discussion of locks vis-a-vis their effect on DDL propagation; evidently the same is true for MOVE SET. And if it's a case that could cause confusion, it's worth documenting. Come to think of it, this warrants some thought on methodology. There has GOT to be a better answer than FAILOVER; that just seems too awful. One approach, off the top of my head... 1. Have slony connections run under the "slony" user so that you can carefully target users. (This definitely falls under a "best practices" notion; at Afilias, we have found it preferable to have users 'molly' (to do vacuums) and 'dumpy' (to do backups) which allows us to easily lock these users out when doing maintenance that needs to boot them out.) 2. At the time you're about to do the changeover, change pg_hba.conf to lock out ALL users but slony. 3. HUP the postmaster and watch new connections fail. Yes, that locks out users; you're in the process of switching nodes, so I say "tough luck!!!" 4. After a few seconds, if you don't have the exclusive access needed to do what you need to do, issue a SIG 3 to one of the processes, thereby knocking EVERYONE out. 5. Now, done, restore pg_hba.conf, and SIGHUP to let the users back in. This probably needs to be done on both the former origin and the prospective new one. I'm not certain where pgpool functionality stands on this... A nifty approach, if possible, would be to handle some of the above by having pgpool "defer" the users' connections until you give that SIGHUP to let everyone back in. That could, for many cases, merely defer database actions for a little while, making it look like things were slow for a while. Actually, rereading that, I think the "top of my head" is working pretty well this evening. I don't see there being a materially better answer, and I think you need to plan to be about as "draconian" as that points to. >>I would point out that 1.0.5 has quite a lot of bug fixes over >>1.0.2; I'd suggest that you considering that upgrade, so as to >>be only 6 months out of date rather than 9 :-). > > Moving to 1.0.5 is one of the fondest hopes in my life at the moment, > actually, but it appears to take an international treaty to get anything > new into our build system.... I understand that fairly well! Actually, if it's not something likely in the very short term, then you probably should not bother with 1.0.5, and wait for 1.1. It has an extra six months worth of improvements, and shouldn't force a "oh, we need to upgrade three times to get to that point" sort of scenario. And vis-a-vis your challenges of locking people out, I don't think that should be an issue with a Slony-I upgrade. None of the upgrade steps require users to disconnect, in that none of the objects that get upgraded are ones that users would take locks on. 1. Stop old slons 2. slonik "upgrade functions" which updates a whole lot of functions, but doesn't touch triggers or tables (well, it may touch one internal one, but that should be OK). 3. Start new slons (e.g. - new versions of slons) Good luck with the diplomatic process! :-)
- Previous message: [Slony1-general] move set
- Next message: [Slony1-general] Re: running slony in polling mode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list