Brian Fehrle brianf at consistentstate.com
Tue May 25 12:09:26 PDT 2010
I've run the slonik script that did the sync, and it finished with a 
result of 0. The number of rows remain unchanged, the slave being less 
than the master.

I've generated a list of rows that the master has that the slave does 
not, and we're doing some investigating to see if it was somthing that 
someone had done, or something else.

- Brian Fehrle

Brian Fehrle wrote:
> Steve Singer wrote:
>   
>> Brian Fehrle wrote:
>>
>>     
>>> Hi all,
>>>       
>> A few things I would look at
>>
>> Look at  sl_event and sl_confirm.  Are there events in sl_event that 
>> are larger than what shows up as being confirmed in sl_confirm?  When 
>> where these events generated?   If so look at the next unconfirmed 
>> event in sl_event and see what type of event it is.
>>
>>     
> All entries between sl_event and sl_confirm match exactly (each 
> con_seqno from sl_confirm matches an ev_seqno from sl_event)
>   
>> You can look at sl_log_1 and sl_log_2, you should see your missing 
>> rows, in particular the ev_snapshot from sl_event of the last 
>> unconfirmed SYNC event should give you the range of rows (log_txid) of 
>> some of the unreplicated rows.  The set of all of the unconfirmed sync 
>> events should give you all of the rows in sl_log_1 and sl_log_2 that 
>> need to still be sent.
>>
>>     
> On both the master and the slave, there are zero entries in either 
> sl_log_1 or sl_log_2.
>   
>> You can also try a slonik script like
>>
>> sync(id=1);
>> wait for event(origin=1, confirmed=all, wait on=1);
>>
>>
>> This generate a sync event and wait until it gets replicated.  If 
>> slonik exists on success and  your still missing those rows then 
>> something strange is going on (I would start to wonder if you did 
>> something like an execute script on your replica that deleted rows 
>> just from the replica)
>>     
> I was wondering the same thing, however doesn't the slave node refuse 
> updates/inserts/deletes via a locking system? There are quite a few 
> people who use the databases and I can't account for all actions by 
> everyone. I will give this sync command a try in a bit, I need to wait 
> on some things before I can give it a try.
>
> Another thing that has come to mind, when we first added this table to 
> the replication set, we had a few problems with some of our scrips which 
> resulted in a daemon attempting to start the slon daemons even if they 
> were already running. Normally the daemons are smart enough to kill 
> themselves, however since this was going on during the initial 
> propagation of the data to the slave, it may have done something 
> unintentional.
>
> - Brian
>   
>> Steve
>>
>>
>>
>>
>>
>>     
>>>     I'm having some trouble determining why replication isn't 
>>> happening on a replication table. I have a two node slony cluster. I 
>>> have a table in the slony replication set that has 72332 records on 
>>> the master, however it has 71225 records on the slave. It's been this 
>>> way for a few hours at least (could be more as that is when we first 
>>> noticed it). This table was added to the replication set several 
>>> weeks ago, so it's not stalled mid-publish. The slon daemons are 
>>> running, and the logs for the daemons report no abnormalities. I've 
>>> restarted the slon daemons to see if it would clear anything up, but 
>>> it remains the same.
>>>
>>> Looking at sl_status, the lag events never go above 1, and the lag 
>>> time never goes above a couple of minutes.
>>>
>>> Best reasons I can think of are, either something is causing the 
>>> replication on this particular table to be on "hold" and not update 
>>> the remaining rows on the slave, while not alerting me via the slon 
>>> logs. Or something went screwy and replication for that table is out 
>>> of sync and I need to drop the table from the set and add it back 
>>> again, let it sync up (however this solution is not ideal.)
>>>
>>> Any tips of places I should look to see what may be going on?
>>>
>>> Thanks in advance.
>>>
>>>        - Brian Fehrle
>>>
>>> Data that may be important:
>>>
>>> Commands that start the slon daemons:
>>> /usr/local/pgsql/bin/slon -p /usr/local/pgsql/log/slon.node1.pid -s 
>>> 60000 -t 300000 SLONY "dbname=$MASTERDBNAME port=$MASTERPORT 
>>> host=$MASTERHOST user=$REPUSER"  > 
>>> /usr/local/pgsql/log/slon.node1.log 2>&1 &
>>> /usr/local/pgsql/bin/slon -p  /usr/local/pgsql/log/slon.node2.pid-s 
>>> 60000 -a /usr/local/pgsql/slon_logs -t 300000 -x "log_parsing_script" 
>>> SLONY "dbname=$SLAVEDBNAME port=$SLAVEPORT host=$SLAVEHOST 
>>> user=$REPUSER"  > /usr/local/pgsql/log/slon.node2.log 2>&1 &
>>>
>>> slony version 1.2.20
>>> master PostgreSQL version 8.4.1
>>> slave PostgreSQL version 8.4.2
>>>
>>>
>>> _______________________________________________
>>> Slony1-general mailing list
>>> Slony1-general at lists.slony.info
>>> http://lists.slony.info/mailman/listinfo/slony1-general
>>>       
>>     
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>   



More information about the Slony1-general mailing list