Sat Jun 15 01:46:03 PDT 2013
- Previous message: [Slony1-general] Slony cleanupEvent erroring out with "server closed the connection unexpectedly" - Soln version: 2.1.2
- Next message: [Slony1-general] Slony cleanupEvent erroring out with "server closed the connection unexpectedly" - Soln version: 2.1.2
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jan, You are right. Tweaking the tcp keep alive parameters helped. Now slon.conf contains: tcp_keepalive=true tcp_keepalive_interval=45 tcp_keepalive_count=20 tcp_keepalive_idle=30 cleanup_interval=30 Thanks a lot for the timely response. - Sridevi On Thu, Jun 13, 2013 at 11:46 PM, Jan Wieck <JanWieck at yahoo.com> wrote: > On 06/13/13 06:25, Sridevi R wrote: > > Hello Jan, > > > > The Master and Slave DBs talk through a firewall. > > VIP IPs and SNAT IPs are used in pg_hba.conf. > > > > The corresponding messages in the postgres server log: > > > > 2013-06-13 09:46:21.224 GMT,,,6630,"10.4.2.2:42031 > > <http://10.4.2.2:42031>",51b994ed.19e6,1,"",2013-06-13 09:46:21 > > GMT,,0,LOG,08P01,"incomplete startup packet",,,,,,,,,"" > > 2013-06-13 09:57:38.596 GMT,"postgres","db01",6634,"<ip address printed > > here>:53924",51b994f7.19ea,1,"idle",2013-06-13 09:46:31 > > GMT,28/0,0,LOG,08006,"could not receive data from client: Connection > > reset by peer",,,,,,,,,"slon.node_1_listen" > > 2013-06-13 09:57:38.596 GMT,"postgres","db01",6634,"<ip address printed > > here>:53924",51b994f7.19ea,2,"idle",2013-06-13 09:46:31 > > GMT,28/0,0,LOG,08P01,"unexpected EOF on client > > connection",,,,,,,,,"slon.node_1_listen" > > 2013-06-13 09:57:38.607 GMT,"postgres","db01",6637,"<ip address printed > > here>:53926",51b994f9.19ed,1,"idle",2013-06-13 09:46:33 > > GMT,32/0,0,LOG,08006,"could not receive data from client: Connection > > reset by peer",,,,,,,,,"slon.subscriber_1_provider_1" > > 2013-06-13 09:57:38.607 GMT,"postgres","db01",6637,"<ip address printed > > here>:53926",51b994f9.19ed,2,"idle",2013-06-13 09:46:33 > > GMT,32/0,0,LOG,08P01,"unexpected EOF on client > > connection",,,,,,,,,"slon.subscriber_1_provider_1" > > 2013-06-13 09:57:38.608 GMT,"postgres","db01",6635,"<ip address printed > > here>:53925",51b994f7.19eb,1,"idle",2013-06-13 09:46:31 > > GMT,31/0,0,LOG,08006,"could not receive data from client: Connection > > reset by peer",,,,,,,,,"slon.node_1_listen" > > 2013-06-13 09:57:38.608 GMT,"postgres","db01",6635,"<ip address printed > > here>:53925",51b994f7.19eb,2,"idle",2013-06-13 09:46:31 > > GMT,31/0,0,LOG,08P01,"unexpected EOF on client > > connection",,,,,,,,,"slon.node_1_listen" > > > > The client slon log contains: > > 2013-06-13 09:57:38 GMT FATAL cleanupThread: "begin;lock table > > "_xx_cluster".sl_config_lock;select "_xx_cluster".cleanupEvent('10 > > minutes'::interval);commit;" - server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > This all can very well be a slightly too eager firewall dropping idle > connections. Have you tried to enable TCP keep alive options that kick > in after something like 30 seconds? If not, enable them on both, the PG > server and the Slony side. That usually prevents those firewall issues. > > > Jan > > > > > > > > Thanks, > > Sridevi > > > > > > > > > > > > On Thu, Jun 13, 2013 at 12:02 AM, Jan Wieck <JanWieck at yahoo.com > > <mailto:JanWieck at yahoo.com>> wrote: > > > > On 06/12/13 10:17, Sridevi R wrote: > > > Jan, > > > > > > Thanks for the reply. > > > > > > The only errors in the slon log are failure of cleanupThread. > > > child process is restarting right after the cleanupThread Failure. > > > This occurs approximately every 10 minutes since cleanup_interval > > is set > > > to 10 minutes. > > > > > > Here is a sample from the log again: > > > > > > 2013-06-06 14:23:27 GMT FATAL cleanupThread: "begin;lock table > > > "_xx_cluster".sl_config_lock;select "_xx_cluster".cleanupEvent('10 > > > minutes'::interval);commit;" - server closed the connection > > unexpectedly > > > This probably means the server terminated abnormally > > > before or while processing the request. > > > 2013-06-06 14:23:27 GMT CONFIG slon: child terminated signal: 9; > pid: > > > 16135, current worker pid: 16135 > > > 2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10 > seconds > > > > "server closed the connection unexpectedly" ... > > > > Is this connection by any chance through some firewall or NAT gateway > > that will drop idle connections? > > > > What are the corresponding postmaster server log entries? Since slony > > reports an unexpected connection drop from the server, the server > must > > have some message in its log too, because the client never sent the > 'X' > > libpq protocol message. > > > > > > Jan > > > > > > > > > > Thanks , > > > Sridevi > > > > > > > > > On Wed, Jun 12, 2013 at 7:33 PM, Jan Wieck <JanWieck at yahoo.com > > <mailto:JanWieck at yahoo.com> > > > <mailto:JanWieck at yahoo.com <mailto:JanWieck at yahoo.com>>> wrote: > > > > > > On 06/12/13 07:14, Sridevi R wrote: > > > > Hello, > > > > > > > > The slony logs are consistently posting this error: > > > > > > > > 2013-06-12 10:01:05 GMT FATAL cleanupThread: "begin;lock > table > > > > "_xx_cluster".sl_config_lock;select > > "_xx_cluster".cleanupEvent('10 > > > > minutes'::interval);commit;" - server closed the connection > > > unexpectedly > > > > 2013-06-12 10:12:24 GMT FATAL cleanupThread: "begin;lock > table > > > > "_xx_cluster".sl_config_lock;select > > "_xx_cluster".cleanupEvent('10 > > > > minutes'::interval);commit;" - server closed the connection > > > unexpectedly > > > > > > > > checked and found that sl_confirm table is not cleaned up. > > cleanup > > > event > > > > never succeeds. > > > > Additionally, the child processes terminates and restarts > > after each > > > > such cleanup failure. > > > > > > > > 2013-06-11 11:20:04 GMT CONFIG slon: child terminated > > signal: 9; pid: > > > > 20172, current worker pid: 20172 > > > > 2013-06-11 11:20:04 GMT CONFIG slon: restart of worker in 10 > > seconds > > > > > > > > When cleanup is run manually, on the psql prompt it runs to > > completion > > > > without any issues and cleans up sl_event and sl_confirm > tables > > > > "begin;lock table "_xx_cluster".sl_config_lock;select > > > > "_xx_cluster".cleanupEvent('10 minutes'::interval);commit;" > > > > > > > > Soln version: 2.1.2 > > > > > > > > Any help/insight would be greatly appreciated. > > > > > > Slon kills its worker(s) with signal 9 (SIGKILL) when it needs > to > > > restart, like when there are errors in event processing or if > it > > > receives certain signals. Are there any other errors in the > > slon log or > > > is something on the machine sending signals to slon? > > > > > > > > > Jan > > > > > > > > > > > Thanks, > > > > Sridevi > > > > > > > > > > > > > > > > _______________________________________________ > > > > Slony1-general mailing list > > > > Slony1-general at lists.slony.info > > <mailto:Slony1-general at lists.slony.info> > > > <mailto:Slony1-general at lists.slony.info > > <mailto:Slony1-general at lists.slony.info>> > > > > http://lists.slony.info/mailman/listinfo/slony1-general > > > > > > > > > > > > > -- > > > Anyone who trades liberty for security deserves neither > > > liberty nor security. -- Benjamin Franklin > > > > > > > > > > > > -- > > Anyone who trades liberty for security deserves neither > > liberty nor security. -- Benjamin Franklin > > > > > > > > > > _______________________________________________ > > Slony1-general mailing list > > Slony1-general at lists.slony.info > > http://lists.slony.info/mailman/listinfo/slony1-general > > > > > -- > Anyone who trades liberty for security deserves neither > liberty nor security. -- Benjamin Franklin > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20130615/cab5c0d1/attachment.htm
- Previous message: [Slony1-general] Slony cleanupEvent erroring out with "server closed the connection unexpectedly" - Soln version: 2.1.2
- Next message: [Slony1-general] Slony cleanupEvent erroring out with "server closed the connection unexpectedly" - Soln version: 2.1.2
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list