Jan Wieck JanWieck at Yahoo.com
Thu Sep 27 17:32:10 PDT 2012
On 9/27/2012 6:48 PM, Brian Fehrle wrote:
> On 09/27/2012 03:40 PM, Jan Wieck wrote:
>> On 9/27/2012 5:30 PM, Christopher Browne wrote:
>>> On Thu, Sep 27, 2012 at 5:26 PM, Jan Wieck <JanWieck at yahoo.com> wrote:
>>>> My guess is that the right solution to this is to clean out everything
>>>> again when a STORE NODE comes along. We had been thinking of making the
>>>> node ID non-reusable to prevent this sort of race conditions.
>>>
>>> I'm not sure I'm totally comfortable with cleaning it all out
>>> instantly; as a step towards that, I'd think it a good idea for slonik
>>> to check all the nodes for existence of a node ID, and refuse if it's
>>> found anywhere.
>>>
>>> Under that circumstance, you might need to wait, to run the STORE
>>> NODE, until the cleanup thread has run on all the nodes to expunge the
>>> last bits of the node on all nodes' databases.
>>>
>>> Smells a bit safer to me...
>>>
>>
>> Check cleanupEvent(). I think it will never remove that stale event.
>>
> Yeah, it looks like it will only remove confirmed ones.
>
> --------------code from cleanupEvent()-----------------
>       -- ----
>       -- Then remove all events that are confirmed by all nodes in the
>       -- whole cluster up to the last SYNC
>       -- ----
>       for v_min_row in select con_origin, min(con_seqno) as con_seqno
>                   from sl_confirm
>                   group by con_origin
>       loop
>           select coalesce(max(ev_seqno), 0) into v_max_sync
>                   from sl_event
>                   where ev_origin = v_min_row.con_origin
>                   and ev_seqno <= v_min_row.con_seqno
>                   and ev_type = 'SYNC';
>           if v_max_sync > 0 then
>               delete from sl_event
>                       where ev_origin = v_min_row.con_origin
>                       and ev_seqno < v_max_sync;
>           end if;
>       end loop;
>
> the query that hits sl_confirm for the loop returns the following:
>    con_origin | con_seqn0
> ------------+------------
>             1 | 5000242178
>             2 | 5000661718
>             4 | 5000060743
>
> So it never hits node 3 to do any delets from sl_event on node three.
> This is the only place in cleanupEvent i believe will do any deletes
> from sl_event.
>
> So should I try to delete this row myself, or would that cause major
> issues also? I'm still wrapping my head around how sl_confirm and
> sl_event work together when adding/removing nodes.

Since there isn't even a node 3 in sl_node, it is safe to delete that row.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


More information about the Slony1-general mailing list