Jan Wieck JanWieck at Yahoo.com
Wed May 28 08:55:08 PDT 2008
On 5/28/2008 11:51 AM, Christopher Browne wrote:
> Christopher Browne <cbbrowne at ca.afilias.info> writes:
>> Unfortunately, pg_stat_activity pulls the PIDs from the stats
>> collector, so that there is a delay in changes being reported.  (And I
>> have seen situations where the stats collector got "blown," in which
>> case, this wouldn't report anything even *nearly* correct.)
> 
> Actually, forget the concern: pg_stat_activity *is* good enough, for
> this purpose.

Unless someone has deactivated stat collection altogether.


Jan


> 
> The *OTHER* reference to pg_listener is a little bit later in the same
> function, and it takes place in the context of the following sort of
> loop:
> 
>   
>   for each node
>      get the PID of the slon
> 
>   [we run failednode() against each of the nodes...]
> 
>   while not done
>     for each node
>        make sure the slon PID has changed from the one found the first time
> 
> [Subtext to all of this: If the slon was "dead" during any of that,
> then having _no_ PID behaves rather like NULL, where NULL <> NULL, and
> the loop can terminate successfully.]
> 
> It is fine for these queries to be done based on statistical records
> in pg_stat_activity, as there are the following possibilities:
> 
>   1.  If the stats are up to date, then all is well.
> 
>   2.  If the stats are falling behind, then we may loop extra times in
>       the "while not done" portion of the logic, which is OK.
> 
>   3.  If the stats collector is broken altogether, then this will loop
>       perpetually, until the user does something to (say) restart the
>       offending database, which would certainly rectify the situation.
> 
> In any case, using the stats collector provides *consistent* results
> for all of these scenarios, so it is perfectly fine to use the
> previously-suggested query joining pg_nodelock with pg_stat_activity.
> 
> I think I'm inclined to add logic to the loop (that isn't there now)
> to report at least *something* back if it's encountering problems.
> I'm thinking that after 10 iterations, it should start reporting which
> nodes it is failing to see restarted.


-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



More information about the Slony1-hackers mailing list