[Slony1-general] Listen path generation and bug 1485

Thu Mar 2 13:52:08 PST 2006

Florian G. Pflug wrote:

> Christopher Browne wrote:
>
>> I have a "followup question" concerning #1485...
>>
>> When you ran into problems with your five node example shaped like...
>>
>>        1  -->  10 --> 11
>>            -->  20 --> 21
>>
>> (e.g. - 1 feeds 10 and 20, then 10 feeds 11, 20 feeds 21, and we have
>> paths exactly corresponding to 2-way communications there...)
>
> The paths are "correct" (i.e. the same as in my configuration). The
> "feeds" are not - in my case, 10 is origin of set 1, which is subscribed
> by 11. Ditto for 20, set 2 and 21.

OK, great, that's why I'm asking the questions...

>> It's not entirely clear what your path/listener configuration looked
>> like.
>>
>> From what I can see, it should look like the following:
>>
>> /* cbbrowne@[local]/dba2 slonyregress1=*/ select * from
>> _slony_regress1.sl_path  order by pa_server, pa_client;
>>  pa_server | pa_client | pa_conninfo | pa_connretry
>> -----------+-----------+-------------+--------------
>>          1 |        10 |             |                     1 |       
>> 20 |             |                    10 |         1 |            
>> |                    10 |        11 |             |           
>>         11 |        10 |             |                    20
>> |         1 |             |                    20 |        21
>> |             |                    21 |        20 |            
>> |            (8 rows)
>
> yep, exactly
>
Good, parts were right :-).

>>
>> (Obviously, there would be more useful DSN values in a real system ;-).)
>>
>> Based on the above, and no subscription information, the current,
>> production listen path builder builds the following:
>
> <snipped table>
>
> This sl_listen seems to be ok, but I guess slony could benefit from
> "circle-elimination".
>  li_origin | li_receiver | li_provider
> -----------+-------------+------------
>         10 |          20 |          21
>         11 |           1 |          20
>         11 |          20 |          21
>         20 |          10 |          11
>         21 |           1 |          10
>         21 |          10 |          11
>
> Those sl_listen would implied event-flows like a -> b -> c -> b.
>
The term "circle elimination" is new to me, but I think we'll get past
that...

>> If I define the set and subscriptions as described:
>>
>> /* cbbrowne@[local]/dba2 slonyregress1=*/ select * from
>> _slony_regress1.sl_set;
>>  set_id | set_origin | set_locked | set_comment
>> --------+------------+------------+-------------
>>       1 |          1 |            |
>
> For my case, that should be:
>  set_id | set_origin | set_locked | set_comment
> --------+------------+------------+-------------
>       1 |         10 |            |
>       2 |         20 |            |
>
Understood.  I've revised my sets accordingly.

>> /* cbbrowne@[local]/dba2 slonyregress1=*/ select * from
>> _slony_regress1.sl_subscribe ;
>>  sub_set | sub_provider | sub_receiver | sub_forward | sub_active
>> ---------+--------------+--------------+-------------+------------
>>        1 |            1 |           10 |             |
>>        1 |            1 |           20 |             |
>>        1 |           10 |           11 |             |
>
> Again, this is not my case, mine would be:
>  sub_set | sub_provider | sub_receiver | sub_forward | sub_active
> ---------+--------------+--------------+-------------+------------
>        1 |           10 |           11 |             |
>        2 |           20 |           21 |             |
>
OK, revised...

>
>> (3 rows)
>>
>>  li_origin | li_receiver | li_provider
>> -----------+-------------+-------------
>>          1 |          10 |           1
>> ...
>>         21 |           1 |          20
>>         21 |          10 |           1
>>         21
>
> For your setup, those are correct, as far as I can see (Well, it
> could still benefit from "circle-elimination", similar to the
> no-sets case above, but thats just an optimization).
>
> The code works for your case, because of the dataflow of your example
> (1 -> 10 -> 11, and 1 -> 20 -> 21). When deciding, for example, how
> node 10 should listen for events from node 21, the subscritions (1 -> 10)
> and (10 -> 11) couse RebuildListenEntriesOne to use 11 and 1 as
> providers.
> Now, using 11 is useless, because node 11 won't get events from node 21
> unless 10 already has them, but using 1 works, because 1 gets them
> from 20,
> which in turn gets them from 21.
>
> In my case, however, there is not (1 -> 10) and (1 -> 20) subscription,
> so for the "how does node 10 get events originating from 21" case,
> RebuildListenEntriesOne now _only_ chooses 11 as a provider, which, due
> to the same reason as above, is a useless provider. The
> difference from your case is, that it is now the _only_ provider taken
> into accound by RebuildListenEntriesOne (because of the "if found then
> return statement").

OK, and when I recalculate listen paths, based on the above changes, the
paths for node 10 are thus:

/* cbbrowne@[local]/dba2 slonyregress1=*/ select * from
_slony_regress1.sl_listen where li_receiver = 10;
 li_origin | li_provider | li_receiver
-----------+-------------+-------------
         1 |           1 |          10
        11 |          11 |          10
        20 |          11 |          10
        21 |          11 |          10
(4 rows)

And yes, indeed, there  is a problem there.

The only way node 10 can get messages from nodes 20 and 21 is via node
1; that's the only thing connected to it that can feed from 20/21.

>
>> The difference is these two listener entries, which, analytically, it
>> makes sense can be dropped without adversely affecting the cluster.  (If
>> you think about it, node 21 is a *crummy* node for node 20 to use as a
>> provider for events from node 10, and ditto for 21 as a provider to 20
>> for events from node 11.)
>>
>>  li_origin | li_provider | li_receiver
>> -----------+-------------+-------------
>>         10 |          21 |          20
>>         11 |          21 |          20
>> (2 rows)
>>
>> Would it be possible for you to provide a more exact list of what the
>> (apparently wrong) listen network looked like?  I don't see the existing
>> calculation breaking down, in this case.
>
> Since I patches RebuildListenEntriesOne on all nodes of my cluster, I
> can't
> easily regenerate what vanilla slony was doing. But I hope that I
> explained
> the problem (or, rather, what I perceive to be the problem) clearly
> enough
> in the text above.

No problem; when you provided the additional subscription information,
that lets me replicate the broken configuration.  I've got what I need,
there.

>
>> By the way, the algorithm that you propose seems only to differ in one
>> material way from the existing one:  The existing one will prefer more
>> direct listen paths based on sl_subscribe to those of "lesser"
>> status. That is what led to the 2 row difference above.  That difference
>> shouldn't cause the problem you observed, in that all it does is to
>> eliminate some paths that wouldn't have been useful anyways.
>
> No, I don't think so. My suggested algorith was the following:
>
> "Telling node X via sl_listen to ask neighbour-nodes
> (Those for which a sl_path entry exists) for events from all other nodes,
> apart from those for which the events must have travelled via node X to
> reach the neighbour of X in question. aths that wouldn't have been
> useful anyways."
>
> A more formal description, assume that the slony cluster forms a
> directed graph
> where each node represents a vertex, and each sl_path entries
> represents a (directred)
> edge from pa_server to pa_client.
> 1) Generate all possible traversals of the graph, in the following form:
>    [origin, node1, node2, ... provider, receiver]
>    (There must be edges origin->node1, node2->node2, ...
> nodeN->provider, provider->receiver
>    in the graph).
> 2) Remove all traversals that include circles, or in other words where a
>    node is traversed more than once (e.g a->b->c->b->d).
> 3) Group those traversals by (origin, provider, receiver)
>
> According to someone here on this list, this will have to be refined a
> bit,
> because the events from a set-origin to the subscribers must travel by
> the same
> way as the data does. This amounts to adding the following step to the
> algorithm
> stated above
>
> 4) Remove all traversals (origin,provider,receiver) where receiver
> subscribes a set
>    originating from origin, but where provider is not the set-provider
> for receiver.
>
> The code I wrote, and that is included in the bugreport, was an
> attempt to implement
> that algorithm. The major problem is that the only "easy" way to do
> this, given the current
> structure of slonys system-tables, is an iterative approch, where the
> _already_grouped_
> traversals are build by extending them by one node in every step. This
> already-grouped
> representation, however, lacks the information needed to correctly
> satisfy the "circle-freeness"
> requirement. (Because you don't know exactly which nodes an event
> traverses when travelling
> from origin via provider to receiver).
>
> Maybe I working compromise would be to remove this "circle-freeness"
> requirement - I don't know
> how much of an performance impact a few unnecessary sl_listen entries
> generate.
>
> Another solution would be to add a table sl_listen_raw, with the
> fields (origin, way, receiver),
> and _no_ primary key. Using this table, coding the algorithm above
> would be trivial, and sl_listen
> could afterwards be created from sl_listen_raw in just one step (Using
> a group operation)
>
> I'd prefer the second solution, and would be willing to code it - but
> I guess adding a table
> to a stable release is quite a showstopper for you...

Well, one approach, for that, would be to create it as a temporary
table, which means no need for schema change.  That approach seems quite
attractive, regardless...

This change is targeted for version 1.2, and there are schema changes in
1.2, already, so it's pretty fair game to add a table, if need be :-).

I whiteboarded a couple of attempts at algorithms for this, this
afternoon.  It needs a bit more thought, I think, to come up with
something final...

In any case, if you have some code in mind, it certainly is "fair game"
to create an extra table or two for this.  I'm inclined to do an
iteration or two on this via email discussion before trying to commit
anything to CVS.

I think I need to come up with a query that detects broken
configuration, e.g. - that the results listed above for receiver #10 is
*wrong*...