[Slony1-general] Slony cleanupEvent erroring out with "server closed the connection unexpectedly"

Sat Jun 15 06:24:27 PDT 2013

On 06/15/13 04:46, Sridevi R wrote:
> Jan,
> 
> You are right. Tweaking the tcp keep alive parameters helped.
> 
> Now slon.conf contains:
> 
> tcp_keepalive=true
> tcp_keepalive_interval=45
> tcp_keepalive_count=20
> tcp_keepalive_idle=30
> cleanup_interval=30
> 
> Thanks a lot for the timely response.

You're welcome.

Jan

> 
> - Sridevi
> 
> 
> 
> On Thu, Jun 13, 2013 at 11:46 PM, Jan Wieck <JanWieck at yahoo.com
> <mailto:JanWieck at yahoo.com>> wrote:
> 
>     On 06/13/13 06:25, Sridevi R wrote:
>     > Hello Jan,
>     >
>     > The Master and Slave DBs talk through a firewall.
>     > VIP IPs and SNAT IPs are used in pg_hba.conf.
>     >
>     > The corresponding messages in the postgres server log:
>     >
>     > 2013-06-13 09:46:21.224 GMT,,,6630,"10.4.2.2:42031
>     <http://10.4.2.2:42031>
>     > <http://10.4.2.2:42031>",51b994ed.19e6,1,"",2013-06-13 09:46:21
>     > GMT,,0,LOG,08P01,"incomplete startup packet",,,,,,,,,""
>     > 2013-06-13 09:57:38.596 GMT,"postgres","db01",6634,"<ip address
>     printed
>     > here>:53924",51b994f7.19ea,1,"idle",2013-06-13 09:46:31
>     > GMT,28/0,0,LOG,08006,"could not receive data from client: Connection
>     > reset by peer",,,,,,,,,"slon.node_1_listen"
>     > 2013-06-13 09:57:38.596 GMT,"postgres","db01",6634,"<ip address
>     printed
>     > here>:53924",51b994f7.19ea,2,"idle",2013-06-13 09:46:31
>     > GMT,28/0,0,LOG,08P01,"unexpected EOF on client
>     > connection",,,,,,,,,"slon.node_1_listen"
>     > 2013-06-13 09:57:38.607 GMT,"postgres","db01",6637,"<ip address
>     printed
>     > here>:53926",51b994f9.19ed,1,"idle",2013-06-13 09:46:33
>     > GMT,32/0,0,LOG,08006,"could not receive data from client: Connection
>     > reset by peer",,,,,,,,,"slon.subscriber_1_provider_1"
>     > 2013-06-13 09:57:38.607 GMT,"postgres","db01",6637,"<ip address
>     printed
>     > here>:53926",51b994f9.19ed,2,"idle",2013-06-13 09:46:33
>     > GMT,32/0,0,LOG,08P01,"unexpected EOF on client
>     > connection",,,,,,,,,"slon.subscriber_1_provider_1"
>     > 2013-06-13 09:57:38.608 GMT,"postgres","db01",6635,"<ip address
>     printed
>     > here>:53925",51b994f7.19eb,1,"idle",2013-06-13 09:46:31
>     > GMT,31/0,0,LOG,08006,"could not receive data from client: Connection
>     > reset by peer",,,,,,,,,"slon.node_1_listen"
>     > 2013-06-13 09:57:38.608 GMT,"postgres","db01",6635,"<ip address
>     printed
>     > here>:53925",51b994f7.19eb,2,"idle",2013-06-13 09:46:31
>     > GMT,31/0,0,LOG,08P01,"unexpected EOF on client
>     > connection",,,,,,,,,"slon.node_1_listen"
>     >
>     > The client slon log contains:
>     > 2013-06-13 09:57:38 GMT FATAL  cleanupThread: "begin;lock table
>     > "_xx_cluster".sl_config_lock;select "_xx_cluster".cleanupEvent('10
>     > minutes'::interval);commit;" - server closed the connection
>     unexpectedly
>     >         This probably means the server terminated abnormally
>     >         before or while processing the request.
> 
>     This all can very well be a slightly too eager firewall dropping idle
>     connections. Have you tried to enable TCP keep alive options that kick
>     in after something like 30 seconds? If not, enable them on both, the PG
>     server and the Slony side. That usually prevents those firewall issues.
> 
> 
>     Jan
> 
> 
>     >
>     >
>     > Thanks,
>     > Sridevi
>     >
>     >
>     >
>     >
>     >
>     > On Thu, Jun 13, 2013 at 12:02 AM, Jan Wieck <JanWieck at yahoo.com
>     <mailto:JanWieck at yahoo.com>
>     > <mailto:JanWieck at yahoo.com <mailto:JanWieck at yahoo.com>>> wrote:
>     >
>     >     On 06/12/13 10:17, Sridevi R wrote:
>     >     > Jan,
>     >     >
>     >     > Thanks for the reply.
>     >     >
>     >     > The only errors in the slon log are failure of cleanupThread.
>     >     > child process is restarting right after the cleanupThread
>     Failure.
>     >     > This occurs approximately every 10 minutes since
>     cleanup_interval
>     >     is set
>     >     > to 10 minutes.
>     >     >
>     >     > Here is a sample from the log again:
>     >     >
>     >     > 2013-06-06 14:23:27 GMT FATAL  cleanupThread: "begin;lock table
>     >     > "_xx_cluster".sl_config_lock;select
>     "_xx_cluster".cleanupEvent('10
>     >     > minutes'::interval);commit;" - server closed the connection
>     >     unexpectedly
>     >     >     This probably means the server terminated abnormally
>     >     >     before or while processing the request.
>     >     > 2013-06-06 14:23:27 GMT CONFIG slon: child terminated
>     signal: 9; pid:
>     >     > 16135, current worker pid: 16135
>     >     > 2013-06-06 14:23:27 GMT CONFIG slon: restart of worker in 10
>     seconds
>     >
>     >     "server closed the connection unexpectedly" ...
>     >
>     >     Is this connection by any chance through some firewall or NAT
>     gateway
>     >     that will drop idle connections?
>     >
>     >     What are the corresponding postmaster server log entries?
>     Since slony
>     >     reports an unexpected connection drop from the server, the
>     server must
>     >     have some message in its log too, because the client never
>     sent the 'X'
>     >     libpq protocol message.
>     >
>     >
>     >     Jan
>     >
>     >
>     >     >
>     >     > Thanks ,
>     >     > Sridevi
>     >     >
>     >     >
>     >     > On Wed, Jun 12, 2013 at 7:33 PM, Jan Wieck
>     <JanWieck at yahoo.com <mailto:JanWieck at yahoo.com>
>     >     <mailto:JanWieck at yahoo.com <mailto:JanWieck at yahoo.com>>
>     >     > <mailto:JanWieck at yahoo.com <mailto:JanWieck at yahoo.com>
>     <mailto:JanWieck at yahoo.com <mailto:JanWieck at yahoo.com>>>> wrote:
>     >     >
>     >     >     On 06/12/13 07:14, Sridevi R wrote:
>     >     >     > Hello,
>     >     >     >
>     >     >     > The slony logs are consistently posting this error:
>     >     >     >
>     >     >     > 2013-06-12 10:01:05 GMT FATAL  cleanupThread:
>     "begin;lock table
>     >     >     > "_xx_cluster".sl_config_lock;select
>     >     "_xx_cluster".cleanupEvent('10
>     >     >     > minutes'::interval);commit;" - server closed the
>     connection
>     >     >     unexpectedly
>     >     >     > 2013-06-12 10:12:24 GMT FATAL  cleanupThread:
>     "begin;lock table
>     >     >     > "_xx_cluster".sl_config_lock;select
>     >     "_xx_cluster".cleanupEvent('10
>     >     >     > minutes'::interval);commit;" - server closed the
>     connection
>     >     >     unexpectedly
>     >     >     >
>     >     >     > checked and found that sl_confirm table is not cleaned up.
>     >     cleanup
>     >     >     event
>     >     >     > never succeeds.
>     >     >     > Additionally, the child processes terminates and restarts
>     >     after each
>     >     >     > such cleanup failure.
>     >     >     >
>     >     >     > 2013-06-11 11:20:04 GMT CONFIG slon: child terminated
>     >     signal: 9; pid:
>     >     >     > 20172, current worker pid: 20172
>     >     >     > 2013-06-11 11:20:04 GMT CONFIG slon: restart of worker
>     in 10
>     >     seconds
>     >     >     >
>     >     >     > When cleanup is run manually, on the psql prompt it
>     runs to
>     >     completion
>     >     >     > without any issues and cleans up sl_event and
>     sl_confirm tables
>     >     >     > "begin;lock table "_xx_cluster".sl_config_lock;select
>     >     >     > "_xx_cluster".cleanupEvent('10
>     minutes'::interval);commit;"
>     >     >     >
>     >     >     > Soln version: 2.1.2
>     >     >     >
>     >     >     > Any help/insight would be greatly appreciated.
>     >     >
>     >     >     Slon kills its worker(s) with signal 9 (SIGKILL) when it
>     needs to
>     >     >     restart, like when there are errors in event processing
>     or if it
>     >     >     receives certain signals. Are there any other errors in the
>     >     slon log or
>     >     >     is something on the machine sending signals to slon?
>     >     >
>     >     >
>     >     >     Jan
>     >     >
>     >     >     >
>     >     >     > Thanks,
>     >     >     > Sridevi
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     > _______________________________________________
>     >     >     > Slony1-general mailing list
>     >     >     > Slony1-general at lists.slony.info
>     <mailto:Slony1-general at lists.slony.info>
>     >     <mailto:Slony1-general at lists.slony.info
>     <mailto:Slony1-general at lists.slony.info>>
>     >     >     <mailto:Slony1-general at lists.slony.info
>     <mailto:Slony1-general at lists.slony.info>
>     >     <mailto:Slony1-general at lists.slony.info
>     <mailto:Slony1-general at lists.slony.info>>>
>     >     >     > http://lists.slony.info/mailman/listinfo/slony1-general
>     >     >     >
>     >     >
>     >     >
>     >     >     --
>     >     >     Anyone who trades liberty for security deserves neither
>     >     >     liberty nor security. -- Benjamin Franklin
>     >     >
>     >     >
>     >
>     >
>     >     --
>     >     Anyone who trades liberty for security deserves neither
>     >     liberty nor security. -- Benjamin Franklin
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Slony1-general mailing list
>     > Slony1-general at lists.slony.info
>     <mailto:Slony1-general at lists.slony.info>
>     > http://lists.slony.info/mailman/listinfo/slony1-general
>     >
> 
> 
>     --
>     Anyone who trades liberty for security deserves neither
>     liberty nor security. -- Benjamin Franklin
> 
> 

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin