Tignor, Tom ttignor at akamai.com
Fri Jul 8 12:27:15 PDT 2016
                Hello slony group,
                I’m testing now with slony1-2.2.4. I have just recently produced an error which effectively stops slon processing on some node A due to some node B being dropped. The event reproduces only infrequently. As some will know, a slon daemon for a given node which becomes aware its node has been dropped will respond by dropping its cluster schema. There appears to be a race condition between the node B schema drop and the (surviving) node A receipt of the disableNode (drop node) event. If the former occurs before the latter, all the remote worker threads on node A enter an error state. See the log samples below. I resolved this the first time by deleting all the recent non-SYNC events from the sl_event tables, and more recently with a simple node A slon restart.
                Please advise if there is any ticket I should provide this info to, or if I should create a new one. Thanks.


---- node 1 log ----
2016-07-08 18:06:31 UTC [30382] INFO   remoteWorkerThread_999999: SYNC 5000000008 done in 0.002 seconds
2016-07-08 18:06:33 UTC [30382] INFO   remoteWorkerThread_999999: SYNC 5000000009 done in 0.002 seconds
2016-07-08 18:06:33 UTC [30382] INFO   remoteWorkerThread_2: SYNC 5000017869 done in 0.002 seconds
2016-07-08 18:06:33 UTC [30382] INFO   remoteWorkerThread_3: SYNC 5000018148 done in 0.004 seconds
2016-07-08 18:06:45 UTC [30382] CONFIG remoteWorkerThread_2: update provider configuration
2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_3: "select last_value from "_ams_cluster".sl_log_status" PGRES_FATAL_ERROR ERROR:  schema "_ams_clu\
ster" does not exist
LINE 1: select last_value from "_ams_cluster".sl_log_status
                               ^

2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_3: SYNC aborted
2016-07-08 18:06:45 UTC [30382] CONFIG version for "dbname=ams
      host=198.18.102.45
      user=ams_slony
      sslmode=verify-ca
      sslcert=/usr/local/akamai/.ams_certs/complete-ams_slony.crt
      sslkey=/usr/local/akamai/.ams_certs/ams_slony.private_key
      sslrootcert=/usr/local/akamai/etc/ssl_ca/canonical_ca_roots.pem" is 90119
2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_2: "select last_value from "_ams_cluster".sl_log_status" PGRES_FATAL_ERROR ERROR:  schema "_ams_clu\
ster" does not exist
LINE 1: select last_value from "_ams_cluster".sl_log_status
                               ^

2016-07-08 18:06:45 UTC [30382] ERROR  remoteWorkerThread_2: SYNC aborted
2016-07-08 18:06:45 UTC [30382] ERROR  remoteListenThread_999999: "select ev_origin, ev_seqno, ev_timestamp,        ev_snapshot,        "pg_catalog".txid_sna\
pshot_xmin(ev_snapshot),        "pg_catalog".txid_snapshot_xmax(ev_snapshot),        ev_type,        ev_data1, ev_data2,        ev_data3, ev_data4,        ev\
_data5, ev_data6,        ev_data7, ev_data8 from "_ams_cluster".sl_event e where (e.ev_origin = '999999' and e.ev_seqno > '5000000009') or (e.ev_origin = '2'\
and e.ev_seqno > '5000017870') or (e.ev_origin = '3' and e.ev_seqno > '5000018151') order by e.ev_origin, e.ev_seqno limit 40" - ERROR:  schema "_ams_cluste\
r" does not exist
LINE 1: ...v_data5, ev_data6,        ev_data7, ev_data8 from "_ams_clus...
                                                             ^
2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_3: "start transaction; set enable_seqscan = off; set enable_indexscan = on; " PGRES_FATAL_ERROR ERR\
OR:  current transaction is aborted, commands ignored until end of transaction block
2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_3: SYNC aborted
2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_2: "start transaction; set enable_seqscan = off; set enable_indexscan = on; " PGRES_FATAL_ERROR ERR\
OR:  current transaction is aborted, commands ignored until end of transaction block
2016-07-08 18:06:55 UTC [30382] ERROR  remoteWorkerThread_2: SYNC aborted
----


---- node 999999 log ----
2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_1: SYNC 5000081216 done in 0.004 seconds
2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_2: SYNC 5000017870 done in 0.004 seconds
2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_3: SYNC 5000018150 done in 0.004 seconds
2016-07-08 18:06:44 UTC [558] INFO   remoteWorkerThread_1: SYNC 5000081217 done in 0.003 seconds
2016-07-08 18:06:44 UTC [558] WARN   remoteWorkerThread_3: got DROP NODE for local node ID
NOTICE:  Slony-I: Please drop schema "_ams_cluster"
NOTICE:  drop cascades to 171 other objects
DETAIL:  drop cascades to table _ams_cluster.sl_node
drop cascades to table _ams_cluster.sl_nodelock
drop cascades to table _ams_cluster.sl_set
drop cascades to table _ams_cluster.sl_setsync
drop cascades to table _ams_cluster.sl_table
----

            Tom    ☺



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20160708/bd39c8a5/attachment.htm 


More information about the Slony1-general mailing list