Wed May 25 08:01:32 PDT 2011
- Previous message: [Slony1-general] CESTERROR remoteListenThread_1: timeout (300 s) for event selection
- Next message: [Slony1-general] CESTERROR remoteListenThread_1: timeout (300 s) for event selection
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, May 25, 2011 at 7:43 AM, Ger Timmens <Ger.Timmens at adyen.com> wrote: > We are replicating a 500Gb database from postgresql 8.3 to > postgresql 9.0 using slony1-2.0.6. > > We got the following error in our slon logs during the copy set of > one of the bigger tables: > > CESTERROR remoteListenThread_1: timeout (300 s) for event selection > > The documentation: > > ERROR: remoteListenThread_%d: timeout for event selection > > This means that the listener thread (src/slon/remote_listener.c) > timed out when trying to determine what events were outstanding for it. > > This could occur because network connections broke, in which > case restarting the slon might help. > > Alternatively, this might occur because the slon for this node > has been broken for a long time, and there are an enormous number of > entries in sl_event on this or other nodes for the node to work > through, and it is taking more than slon_conf_remote_listen_timeout > seconds to run the query. In older versions of Slony-I, that > configuration parameter did not exist; the timeout was fixed at 300 > seconds. In newer versions, you might increase that timeout in the > slon config file to a larger value so that it can continue to > completion. And then investigate why nobody was monitoring things > such that replication broke for such a long time... > > Replication seems to continue fine after this error. > Is it save to continue ? > Or should we start from scratch ? > If so what do we have to do to prevent this error from happening again ? Well, the documentation indicates that this error tends to come up for two reasons: a) Because there was some sort of network glitch, or b) Because some kind of misconfiguration left the cluster behind by some stupendous number of events. I think you encountered a), since the error didn't persist. As such, I'd chalk it up to "network glitch," and while there may be some value in doing a network investigation as to why such things might be happening to you, it's not particularly important to replication itself. You shouldn't have any ongoing problem.
- Previous message: [Slony1-general] CESTERROR remoteListenThread_1: timeout (300 s) for event selection
- Next message: [Slony1-general] CESTERROR remoteListenThread_1: timeout (300 s) for event selection
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list