Richard Yen richyen at iparadigms.com
Fri May 4 14:46:26 PDT 2012
On Wed, May 2, 2012 at 2:39 PM, Steve Singer <ssinger at ca.afilias.info>wrote:

> Are any of the above possible:
>
> 1. You had multiple slon daemons writing to the same log archive directory
> (maybe for different clusters?)
>
We have several clusters writing to a directory, but there's a separate
directory for each cluster.  For example:

/home/log_ship
/home/log_ship/cluster1/new_logfiles
/home/log_ship/cluster2/new_logfiles
/home/log_ship/cluster3/new_logfiles
...etc...

We don't have two daemons writing to the same directory.


2.  The mechanism you used for copying the .sql files could have caused
> processes to try to write to the same file on the destination machine
>
I'm fairly certain this is not the case.  The files that I sent you were
directly from the origin machine, not from the destination machine.  Our
scheme is like this:

Node1 is origin
Node2 is subscriber, with -a mode, writing files to
/home/log_ship/cluster1/new_logfiles
Cronjob moves files from /home/log_ship/cluster1/new_logfiles to
/home/log_ship/cluster1/log_staging (we filter out the *.sql.tmp files so
that we can let them finish writing before we move them)
RemoteNode makes rsync connection to Node2 and copies the files from
Node2/home/log_ship/cluster1/log_staging to its local directory
Log files are replayed


> If the answer to both of those is no then maybe there is a bug in how
> archive file numbers are assigned in remote_worker.c:archive_open.
> We don't YET see any obvious faults with this logic but if this logic
> somehow assigned 2 slon worker threads the same id then you could get a
> file like you sent us.
>

As I look at the files you sent me, I only see differences between the
third (Node X, Event XXXXXX) and seventh
(archiveTracking_offline(xxx,'xxxx-xx-xx xx:xx:xx')) lines.  I noticed that
Node number can vary per file, but only one daemon has the -a option
enabled.  Not sure why the node number changes--shouldn't it always
correspond to the node number of the daemon with the -a option turned on?

Aside from that, I tried poking around the sl_* tables that I had dumped,
but didn't really find anything.  One thing is certain, though--a given DML
statement shows up in sl_log_x only once, even though it shows up several
times in the various logship files that are generated.  I can't seem to
find the corresponding sl_event row, so I'm not sure if there might be
anything in that direction, in terms of duplicated events.

--Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120504/896a2a0c/attachment.htm 


More information about the Slony1-general mailing list