Tue May 18 12:44:40 PDT 2010
- Previous message: [Slony1-bugs] [Bug 125] New init script
- Next message: [Slony1-bugs] Events confirmed not removed
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
http://www.slony.info/bugzilla/show_bug.cgi?id=126 Summary: slon sometimes does not recover from a network outage Product: Slony-I Version: 1.2 Platform: Other OS/Version: Other Status: NEW Severity: normal Priority: low Component: slon AssignedTo: slony1-bugs at lists.slony.info ReportedBy: ssinger at ca.afilias.info CC: slony1-bugs at lists.slony.info Estimated Hours: 0.0 We've received a report of slon not recovering properly from a network outage. It appears that the remote listener thread (8431) encountered a network error while the network was done. No network error for the remote worker threads where observed. After the error the remote listener for 8431 apparently continued to queue events (but no logs are available). Replication started to fall behind and did not proceed after the network was restored. Restarting slon made replication work again. -- My theory is that we were waiting on a socket read() inside of libpq and the network died. Since we were not trying to send an event no packets where generated to notify libpq that the network connection died. Setting KEEPALIVE on the connections to postgres should address this. We don't appear to be doing that currently. -- 2010-05-16 20:59:35 UTC DEBUG2 remoteListenThread_8344: LISTEN 2010-05-16 20:59:35 UTC DEBUG2 remoteListenThread_8346: LISTEN 2010-05-16 20:59:35 UTC DEBUG2 remoteWorkerThread_8344: forward confirm 8394,112865 received by 8344 2010-05-16 20:59:35 UTC DEBUG2 remoteWorkerThread_8344: forward confirm 8394,112865 received by 8346 2010-05-16 20:59:38 UTC DEBUG2 remoteListenThread_8346: queue event 8346,157585 SYNC 2010-05-16 20:59:38 UTC DEBUG2 remoteListenThread_8346: UNLISTEN 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8346: Received event 8346,157585 SYNC 2010-05-16 20:59:38 UTC DEBUG2 calc sync size - last time: 1 last length: 10001 ideal: 5 proposed size: 3 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8346: SYNC 157585 processing 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8346: no sets need syncing for this event 2010-05-16 20:59:38 UTC DEBUG2 remoteListenThread_8344: queue event 8344,148724 SYNC 2010-05-16 20:59:38 UTC DEBUG2 remoteListenThread_8344: queue event 8346,157585 SYNC 2010-05-16 20:59:38 UTC DEBUG2 remoteWorker_event: event 8346,157585 ignored - duplicate 2010-05-16 20:59:38 UTC DEBUG2 remoteListenThread_8344: UNLISTEN 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8344: Received event 8344,148724 SYNC 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8344: SYNC 148724 processing 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8344: no sets need syncing for this event 2010-05-16 20:59:38 UTC ERROR remoteListenThread_8341: "select con_origin, con_received, max(con_seqno) as con_seqno, max(con_timestamp) as con_timestamp from "_oxrsin".sl_confirm where con_received <> 8394 group by con_origin, con_received" could not receive data from server: Connection timed out 2010-05-16 20:59:38 UTC DEBUG2 remoteWorkerThread_8344: forward confirm 8346,157585 received by 8344 2010-05-16 20:59:42 UTC DEBUG2 syncThread: new sl_action_seq 1 - SYNC 112866 2010-05-16 20:59:45 UTC DEBUG2 localListenThread: Received event 8394,112866 SYNC 2010-05-16 20:59:45 UTC DEBUG2 remoteListenThread_8344: LISTEN 2010-05-16 20:59:45 UTC DEBUG2 remoteListenThread_8346: LISTEN 2010-05-16 20:59:45 UTC DEBUG2 remoteListenThread_8344: LISTEN 2010-05-16 20:59:45 UTC DEBUG2 remoteListenThread_8346: LISTEN 2010-05-16 20:59:45 UTC DEBUG2 remoteWorkerThread_8346: forward confirm 8394,112866 received by 8344 2010-05-16 20:59:45 UTC DEBUG2 remoteWorkerThread_8346: forward confirm 8344,148724 received by 8346 2010-05-16 20:59:45 UTC DEBUG2 remoteWorkerThread_8344: forward confirm 8394,112866 received by 8346 2010-05-16 20:59:48 UTC DEBUG2 remoteListenThread_8346: queue event 8346,157586 SYNC 2010-05-16 20:59:48 UTC DEBUG2 remoteListenThread_8346: UNLISTEN 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8346: Received event 8346,157586 SYNC 2010-05-16 20:59:48 UTC DEBUG2 calc sync size - last time: 1 last length: 10002 ideal: 5 proposed size: 3 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8346: SYNC 157586 processing 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8346: no sets need syncing for this event 2010-05-16 20:59:48 UTC DEBUG2 remoteListenThread_8344: queue event 8344,148725 SYNC 2010-05-16 20:59:48 UTC DEBUG2 remoteListenThread_8344: queue event 8346,157586 SYNC 2010-05-16 20:59:48 UTC DEBUG2 remoteWorker_event: event 8346,157586 ignored - duplicate 2010-05-16 20:59:48 UTC DEBUG2 remoteListenThread_8344: UNLISTEN 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8344: Received event 8344,148725 SYNC 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8344: SYNC 148725 processing 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8344: no sets need syncing for this event 2010-05-16 20:59:48 UTC DEBUG2 remoteWorkerThread_8344: forward confirm 8346,157586 received by 8344 2010-05-16 20:59:49 UTC DEBUG2 remoteWorkerThread_8346: forward confirm 8344,148725 received by 8346 2010-05-16 20:59:52 UTC DEBUG2 syncThread: new sl_action_seq 1 - SYNC 112867 2010-05-16 20:59:55 UTC DEBUG2 remoteListenThread_8344: LISTEN 2010-05-16 20:59:55 UTC DEBUG2 remoteListenThread_8346: LISTEN 2010-05-16 20:59:55 UTC DEBUG2 remoteWorkerThread_8344: forward confirm 8394,112867 received by 8344 2010-05-16 20:59:55 UTC DEBUG2 remoteWorkerThread_8346: forward confirm 8394,112867 received by 8346 -- Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You are the assignee for the bug.
- Previous message: [Slony1-bugs] [Bug 125] New init script
- Next message: [Slony1-bugs] Events confirmed not removed
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list