[Slony1-general] Issue

Mon Aug 30 22:57:33 PDT 2004

On 8/30/2004 6:27 PM, DeJuan Jackson wrote:
> I seem to be making a habit of replying to my own posts.  Anyone seen 
> this before?  I'm beginning to wounder if I have a BO problem.

The problem is well known and has task-ID 391.

http://gborg.postgresql.org/project/slony1/task/taskshow.php?391

Jan

> 
> DeJuan Jackson wrote:
> 
>> Well, bad form replying to my own post, but no one else has so I'll 
>> have to.
>>
>> This is the slon output at -d 4 Node 1:
>> CONFIG main: local node id = 1
>> CONFIG main: loading current cluster configuration
>> CONFIG storeNode: no_id=2 no_comment='Node 2'
>> DEBUG2 setNodeLastEvent: no_id=2 event_seq=7032
>> CONFIG storePath: pa_server=2 pa_client=1 
>> pa_conninfo="dbname=test_destination host=river user=postgres" 
>> pa_connretry=10
>> CONFIG storeListen: li_origin=2 li_receiver=1 li_provider=2
>> CONFIG storeSet: set_id=1 set_origin=1 set_comment='All pgbench tables'
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> CONFIG storeSet: set_id=2 set_origin=1 set_comment='seq_test table'
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> DEBUG2 main: last local event sequence = 8095
>> CONFIG main: configuration complete - starting threads
>> DEBUG1 localListenThread: thread starts
>> FATAL  localListenThread: Another slon daemon is serving this node 
>> already
>>
>> Node 2:
>> CONFIG main: local node id = 2
>> CONFIG main: loading current cluster configuration
>> CONFIG storeNode: no_id=1 no_comment='Node 1'
>> DEBUG2 setNodeLastEvent: no_id=1 event_seq=8083
>> CONFIG storePath: pa_server=1 pa_client=2 
>> pa_conninfo="dbname=test_source host=river user=postgres" pa_connretry=10
>> CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1
>> CONFIG storeSet: set_id=1 set_origin=1 set_comment='All pgbench tables'
>> WARN   remoteWorker_wakeup: node 1 - no worker thread
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> CONFIG storeSet: set_id=2 set_origin=1 set_comment='seq_test table'
>> WARN   remoteWorker_wakeup: node 1 - no worker thread
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> CONFIG storeSubscribe: sub_set=1 sub_provider=1 sub_forward='f'
>> WARN   remoteWorker_wakeup: node 1 - no worker thread
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> CONFIG enableSubscription: sub_set=1
>> WARN   remoteWorker_wakeup: node 1 - no worker thread
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> CONFIG storeSubscribe: sub_set=2 sub_provider=1 sub_forward='f'
>> WARN   remoteWorker_wakeup: node 1 - no worker thread
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> CONFIG enableSubscription: sub_set=2
>> WARN   remoteWorker_wakeup: node 1 - no worker thread
>> DEBUG2 sched_wakeup_node(): no_id=1 (0 threads + worker signaled)
>> DEBUG2 main: last local event sequence = 7032
>> CONFIG main: configuration complete - starting threads
>> DEBUG1 localListenThread: thread starts
>> FATAL  localListenThread: Another slon daemon is serving this node 
>> already
>>
>> Don't know if that will help.
>> I looked at the pg_listener layout, and the only fix I can think of is 
>> to check for the pid in the query.  This would only work from the DB 
>> and only if stats are on.  But assuming stats are on then the 
>> pg_stat_get_backend_idset function combined with the 
>> pg_stat_get_backend_pid function would tell you what PID's are 
>> current;y connected.  So, you could filter your listner list by this 
>> data, and get a more representative list of active listners.  But this 
>> would neccesitate a way to determine if stats are on, so you could 
>> just use pg_stat_get_backend_idset to see if any rows come back after 
>> an appropriate delay (I believe TOM said it can be as much as 500ms), 
>> because your own connection should always be there at minimum.
>>
>> So, should I file this as a Bug should I submit a patch, or should I 
>> just stick my problem somewhere dark and dank so that it's never heard 
>> from again?  Enquiring minds want to know.
>>
>> DeJuan Jackson wrote:
>>
>>> I've been putting Slony-I 1.0.2 though it's paces so to speak and I 
>>> have a concern /question.
>>> select version();
>>>
>>>                                                 
>>> version                                                
>>> --------------------------------------------------------------------------------------------------------- 
>>>
>>> PostgreSQL 7.4.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 
>>> 3.3.3 20040412 (Red Hat Linux 3.3.3-7)
>>>
>>> When I do the ever faithful pull the power on the test box while 
>>> pgbench and replication is running,  once the box comes back up the 
>>> slon's (both source and destination) die with a FATAL message 
>>> "localListenThread: Another slon daemon is serving this node 
>>> already".  I tracked this down to a check in 
>>> src/slon/local_listner.c  The error message only happens when a row 
>>> exists in the pg_catalog.pg_listener where relname = 
>>> '_<clustername>_Restart'.
>>>
>>> I can clear the error up by issuing a NOTIFY "_<clustername>_Restart" 
>>> on both the source and the target, then issuing a kill -9 on the two 
>>> slon's that are running and the re-launching them (I've waited 
>>> approcimately 3 minutes with no response from the slon's and normal 
>>> kill doesn't work).  The NOTIFY get's rid of the old pg_listener 
>>> entries, the kill get's rid of the current entries, and the restart 
>>> prompts the new slon's to pick up where they left off before the 
>>> simulated outage.
>>>
>>> Need any more info?
>>
>>
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general at gborg.postgresql.org
>> http://gborg.postgresql.org/mailman/listinfo/slony1-general
> 
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #