Jerry Sievers jerry at jerrysievers.com
Thu Jul 5 14:15:01 PDT 2007
Hi Jan;  here's a quick look at what sorts of events are in sl_event. 


Pager usage is off.
 ev_origin | ev_seqno | ev_timestamp | ev_minxid | ev_maxxid | ev_xip | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 
-----------+----------+--------------+-----------+-----------+--------+---------+----------+----------+----------+----------+----------+----------+----------+----------
(0 rows)

Pager usage is off.
 ev_origin | ev_seqno | ev_timestamp | ev_minxid | ev_maxxid | ev_xip | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 
-----------+----------+--------------+-----------+-----------+--------+---------+----------+----------+----------+----------+----------+----------+----------+----------
(0 rows)

Pager usage is off.
 ev_origin | ev_seqno | ev_timestamp | ev_minxid | ev_maxxid | ev_xip | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 
-----------+----------+--------------+-----------+-----------+--------+---------+----------+----------+----------+----------+----------+----------+----------+----------
(0 rows)



Jan Wieck <JanWieck at Yahoo.com> writes:

> On 7/5/2007 4:34 PM, Jerry Sievers wrote:
> 
> > select * from sl_status on the three nodes still configured.
> 
> Apparently node 1 didn't receive any events from 3 or 4 for over 5
> hours. Well, what does
> 
>      select * from sl_event where ev_type = 'ENABLE_NODE';
> 
> give you one all 3 nodes?
> 
> 
> Jan
> 
> > Please advise. Pager usage is off.
> > Expanded display is on.
> > -[ RECORD 1 ]-------------+-----------------------------
> > st_origin                 | 1
> > st_received               | 3
> > st_last_event             | 2225235
> > st_last_event_ts          | 05-JUL-07 15:54:54.810343
> > st_last_received          | 2225131
> > st_last_received_ts       | 05-JUL-07 15:16:23.496708
> > st_last_received_event_ts | 05-JUL-07 14:58:07.240334
> > st_lag_num_events         | 104
> > st_lag_time               | @ 5 hours 21 mins 45.53 secs
> > -[ RECORD 2 ]-------------+-----------------------------
> > st_origin                 | 1
> > st_received               | 4
> > st_last_event             | 2225235
> > st_last_event_ts          | 05-JUL-07 15:54:54.810343
> > st_last_received          | 2225131
> > st_last_received_ts       | 05-JUL-07 15:14:07.409965
> > st_last_received_event_ts | 05-JUL-07 14:58:07.240334
> > st_lag_num_events         | 104
> > st_lag_time               | @ 5 hours 21 mins 45.53 secs
> > Pager usage is off.
> > Expanded display is on.
> > -[ RECORD 1 ]-------------+-----------------------------
> > st_origin                 | 3
> > st_received               | 4
> > st_last_event             | 1863901
> > st_last_event_ts          | 05-JUL-07 18:21:49.29024
> > st_last_received          | 1863896
> > st_last_received_ts       | 05-JUL-07 18:21:04.101713
> > st_last_received_event_ts | 05-JUL-07 18:20:59.06034
> > st_lag_num_events         | 5
> > st_lag_time               | @ 2 hours 2 mins 21.48 secs
> > -[ RECORD 2 ]-------------+-----------------------------
> > st_origin                 | 3
> > st_received               | 1
> > st_last_event             | 1863901
> > st_last_event_ts          | 05-JUL-07 18:21:49.29024
> > st_last_received          | 1862809
> > st_last_received_ts       | 05-JUL-07 14:57:21.461858
> > st_last_received_event_ts | 05-JUL-07 15:00:46.848899
> > st_lag_num_events         | 1092
> > st_lag_time               | @ 5 hours 22 mins 33.69 secs
> > Pager usage is off.
> > Expanded display is on.
> > -[ RECORD 1 ]-------------+-----------------------------
> > st_origin                 | 4
> > st_received               | 1
> > st_last_event             | 1864550
> > st_last_event_ts          | 05-JUL-07 18:21:01.700228
> > st_last_received          | 1863465
> > st_last_received_ts       | 05-JUL-07 14:57:21.23512
> > st_last_received_event_ts | 05-JUL-07 15:00:49.830356
> > st_lag_num_events         | 1085
> > st_lag_time               | @ 5 hours 22 mins 33.96 secs
> > -[ RECORD 2 ]-------------+-----------------------------
> > st_origin                 | 4
> > st_received               | 3
> > st_last_event             | 1864550
> > st_last_event_ts          | 05-JUL-07 18:21:01.700228
> > st_last_received          | 1864550
> > st_last_received_ts       | 05-JUL-07 18:20:56.67848
> > st_last_received_event_ts | 05-JUL-07 18:21:01.700228
> > st_lag_num_events         | 0
> > st_lag_time               | @ 2 hours 2 mins 22.09 secs
> > Jan Wieck <JanWieck at Yahoo.com> writes:
> >
> >> On 7/5/2007 3:03 PM, Jerry Sievers wrote:
> >> > Crisis today.  Complete power failure leaves a corrupt table on
> >> old
> >> > master. I did moveset() and dropnode() to reconfigure the cluster.
> >> > The old
> >> > master was node 2.    New master is node 1.   There are now just 2
> >> > slaves 3 and 4.
> >> > For some reason however, when I try to fire up the slon on the
> >> > master,
> >> > it complains of node #2 does not exist right after reporting having
> >> > init'd node 4. I have no clue what's going wrong here and hope not
> >> > to have to undo
> >> > and reconfig the cluster from scratch.  These DBs are too large now
> >> > for easy subscription during live processing. Any help much
> >> > appreciated. -----------------------------------------
> >> > 2007-07-05 18:19:18 GMT CONFIG main: edb-replication version 1.1.5 starting up
> >> > 2007-07-05 18:19:19 GMT CONFIG main: local node id = 1
> >> > 2007-07-05 18:19:19 GMT CONFIG main: launching sched_start_mainloop
> >> > 2007-07-05 18:19:19 GMT CONFIG main: loading current cluster configuration
> >> > 2007-07-05 18:19:19 GMT CONFIG storeNode: no_id=3 no_comment='slave node 3'
> >> > 2007-07-05 18:19:19 GMT CONFIG storeNode: no_id=4 no_comment='slave node 4'
> >> > 2007-07-05 18:19:19 GMT CONFIG storePath: pa_server=3 pa_client=1 pa_conninfo="dbname=rt3_01 host=192.168.30.172 user=slonik password=foo.j1MiTikGop0rytQuedPid8 port=5432" pa_connretry=5
> >> > 2007-07-05 18:19:19 GMT CONFIG storePath: pa_server=4 pa_client=1 pa_conninfo="dbname=rt3_01 host=192.168.30.173 user=slonik password=foo.j1MiTikGop0rytQuedPid8 port=5432" pa_connretry=5
> >> > 2007-07-05 18:19:19 GMT CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3
> >> > 2007-07-05 18:19:19 GMT CONFIG storeListen: li_origin=4 li_receiver=1 li_provider=4
> >> > 2007-07-05 18:19:19 GMT CONFIG storeSet: set_id=1 set_origin=1 set_comment='RT3/VCASE replication set'
> >> > 2007-07-05 18:19:19 GMT CONFIG storeSet: set_id=2 set_origin=1 set_comment='new set for adding tables'
> >> > 2007-07-05 18:19:19 GMT CONFIG main: configuration complete - starting threads
> >> > NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=12520
> >> > 2007-07-05 18:19:19 GMT CONFIG enableNode: no_id=3
> >> > 2007-07-05 18:19:19 GMT CONFIG enableNode: no_id=4
> >> > 2007-07-05 18:19:19 GMT FATAL  enableNode: unknown node ID 2
> >> > 2007-07-05 18:19:19 GMT INFO   remoteListenThread_4: disconnecting from 'dbname=rt3_01 host=192.168.30.173 user=slonik password=foo.j1MiTikGop0rytQuedPid8 port=5432'
> >> > 2007-07-05 18:19:20 GMT INFO   remoteListenThread_3: disconnecting from 'dbname=rt3_01 host=192.168.30.172 user=slonik password=foo.j1MiTikGop0rytQuedPid8 port=5432'
> >> >
> >> It appears that there is an ENABLE_NODE event on either node 3 or 4
> >> which node 1 tries to replicate. How that could have been lurking
> >> around there forever is another question though.
> >> What is the content of sl_status for all three nodes?
> >> Also, you now might want to change the password for user slony on
> >> those servers ;-)
> >> Jan
> >> -- 
> >> #======================================================================#
> >> # It's easier to get forgiveness for being wrong than for being right. #
> >> # Let's break this rule - forgive me.                                  #
> >> #================================================== JanWieck at Yahoo.com #
> >>
> >
> 
> 
> -- 
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.                                  #
> #================================================== JanWieck at Yahoo.com #
> 

-- 
-------------------------------------------------------------------------------
Jerry Sievers   732 365-2844 (work)     Production Database Administrator
                305 321-1144 (mobil	WWW E-Commerce Consultant


More information about the Slony1-general mailing list