[Slony1-general] sl_nodelock values ("pg_catalog".pg_backend_pid()); "

Fri May 22 11:33:55 PDT 2009

On Fri, 2009-05-22 at 12:49 -0400, Brian A. Seklecki wrote:
> All:
> 
> So this problem with slon(8) daemons continues to vex us.  During a
> switchover, we see "No Worker Thread" errors:
> 
>  2009 May 22 06:37:17 -04:00 bdb01 [slon][55352] [local2] [err] 
>  slon[55352]: [12-1] [55352] CONFIG storeSet: set_id=1 set_origin=3
>  set_comment='All CORES tables'
>  2009 May 22 06:37:17 -04:00 bdb01 [slon][55352] [local2] [warning]
>  slon[55352]: [13-1] [55352] WARN   remoteWorker_wakeup: node 3 - no
>  worker thread
> 
> Followed by:
> 
> 
>  2009 May 22 06:37:17 -04:00 bdb01 [slon][55352] [local2] [err]
>  slon[55352]: [19-1] [55352] FATAL  localListenThread: "select 
>  "_DBNAME".cleanupNodelock(); insert into
>  2009 May 22 06:37:17 -04:00 bdb01 [slon][55352] [local2] [err]
>  slon[55352]: [19-2]  "_DBNAME".sl_nodelock values (   
>  2, 0, "pg_catalog".pg_backend_pid()); " - ERROR:  duplicate key value
>  violates
> 
> The screwed up thing is that, as far as we know, all three slon(8)
> daemons on all there configurations are active, healthy, and responding
> before we execute the switchover.
> 
> We know because we have nagios watching SYNC events and watching that
> sl_log table row counts are within acceptable ranges.
> 
> Any advice on further troubleshooting this?    Maybe attach a ktrace(8)
> to the process and try to re-create the error.
> 
> We're running the latest Slony/PostgreSQL (postgresql-server-8.3.7 +
> slony1-1.2.15) on FBSD6/amd64.
> 
> ~BAS

This looks like the same issue that one of our guys was trying to figure
out.

Restarting the Slon let's the failover proceed, but it sort of sucks
that you have to do that.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.