Mon May 7 12:22:58 PDT 2012
- Previous message: [Slony1-general] Logship files printing incorrectly
- Next message: [Slony1-general] Logship files printing incorrectly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12-05-04 05:46 PM, Richard Yen wrote: > On Wed, May 2, 2012 at 2:39 PM, Steve Singer <ssinger at ca.afilias.info > <mailto:ssinger at ca.afilias.info>> wrote: > > Are any of the above possible: > > 1. You had multiple slon daemons writing to the same log archive > directory (maybe for different clusters?) > > We have several clusters writing to a directory, but there's a separate > directory for each cluster. For example: > > /home/log_ship > /home/log_ship/cluster1/new_logfiles > /home/log_ship/cluster2/new_logfiles > /home/log_ship/cluster3/new_logfiles > ...etc... > We don't have two daemons writing to the same directory. > > > 2. The mechanism you used for copying the .sql files could have > caused processes to try to write to the same file on the destination > machine > > I'm fairly certain this is not the case. The files that I sent you were > directly from the origin machine, not from the destination machine. Our > scheme is like this: > > Node1 is origin > Node2 is subscriber, with -a mode, writing files to > /home/log_ship/cluster1/new_logfiles > Cronjob moves files from /home/log_ship/cluster1/new_logfiles to > /home/log_ship/cluster1/log_staging (we filter out the *.sql.tmp files > so that we can let them finish writing before we move them) > RemoteNode makes rsync connection to Node2 and copies the files from > Node2/home/log_ship/cluster1/log_staging to its local directory > Log files are replayed > > If the answer to both of those is no then maybe there is a bug in > how archive file numbers are assigned in remote_worker.c:archive_open. > We don't YET see any obvious faults with this logic but if this > logic somehow assigned 2 slon worker threads the same id then you > could get a file like you sent us. > > > As I look at the files you sent me, I only see differences between the > third (Node X, Event XXXXXX) and seventh > (archiveTracking_offline(xxx,'xxxx-xx-xx xx:xx:xx')) lines. I noticed > that Node number can vary per file, but only one daemon has the -a > option enabled. Not sure why the node number changes--shouldn't it > always correspond to the node number of the daemon with the -a option > turned on? > The node that has the -a option on its slon is the node number that shows up in the file name. Ie slony1_log_3_XXXXXXXXX.sql is a log file generated by slon # 3. slon 3 has a remote_worker for each of the remote nodes. These remote_worker threads run concurrently. Each one will generate a tracking file for SYNC events from its remote_worker. The archive sequence numbers are supposed to be assigned on the node the slon is for (node 3 is this case). So two remote_worker threads inside of slon 3 SHOULDN'T ever get the same archive counter number (we should be serializing on an update of the archive_counter table), thus they shouldn't be writing to the same file. One theory is that some of the "shouldnt's" are actually happening (for reasons we haven't determined) > Aside from that, I tried poking around the sl_* tables that I had > dumped, but didn't really find anything. One thing is certain, > though--a given DML statement shows up in sl_log_x only once, even > though it shows up several times in the various logship files that are > generated. I can't seem to find the corresponding sl_event row, so I'm > not sure if there might be anything in that direction, in terms of > duplicated events. > > --Richard
- Previous message: [Slony1-general] Logship files printing incorrectly
- Next message: [Slony1-general] Logship files printing incorrectly
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list