[Slony1-general] initial copy going for 6 days now...

Wed Oct 6 19:12:10 PDT 2004

On 10/6/2004 11:51 AM, Vivek Khera wrote:

> Well, I'm now convinced that something is wrong.  There's just no way 
> in heck it should take 6 days to replicate my large table across two 
> machines, even if they are cheap-o IDE disks... :-)
> 
> Pg 7.4.5, Slony 1.0.2, FreeBSD 4.10 on slave, 5.3-BETA on master.
> 
> I have logging of every query turned on.
> 
> On the master, I see these queries of late:
> 
> Oct  6 11:39:58 crash postgres[72808]: [15677-1] LOG:  duration: 15.967 
> ms  statement: select "_mailermailer".createEvent('_mailermailer', 
> 'SYNC', NULL);
> Oct  6 11:39:58 crash postgres[72808]: [15678-1] LOG:  duration: 31.004 
> ms  statement: commit transaction;
> Oct  6 11:40:25 crash postgres[71293]: [7862-1] LOG:  duration: 36.651 
> ms  statement: notify "_mailermailer_Event"; notify 
> "_mailermailer_Confirm"; insert into
> Oct  6 11:40:25 crash postgres[71293]: [7862-2]  
> "_mailermailer".sl_event     (ev_origin, ev_seqno, ev_timestamp,      
> ev_minxid, ev_maxxid, ev_xip, ev_type     ) values ('1',
> Oct  6 11:40:25 crash postgres[71293]: [7862-3]  '8211', '2004-10-06 
> 11:40:25.085698', '55781', '651614', '''55781''', 'SYNC'); insert into 
> "_mailermailer".sl_confirm
> Oct  6 11:40:25 crash postgres[71293]: [7862-4]         (con_origin, 
> con_received, con_seqno, con_timestamp)    values (1, 2, '8211', 
> CURRENT_TIMESTAMP); commit transaction;
> 
> (yes, the machine is named "crash").
> 
> On the slave I'm seeing this:
> 
> Oct  6 11:40:32 staging postgres[48728]: [993064-1] LOG:  duration: 
> 0.653 ms  statement: start transaction;set transaction isolation level 
> serializable;select last_value from
> Oct  6 11:40:32 staging postgres[48728]: [993064-2]  
> "_mailermailer".sl_action_seq;
> Oct  6 11:40:32 staging postgres[48728]: [993065-1] LOG:  duration: 
> 0.519 ms  statement: rollback transaction;
> Oct  6 11:40:33 staging postgres[48728]: [993066-1] LOG:  duration: 
> 0.657 ms  statement: start transaction;set transaction isolation level 
> serializable;select last_value from
> Oct  6 11:40:33 staging postgres[48728]: [993066-2]  
> "_mailermailer".sl_action_seq;
> Oct  6 11:40:33 staging postgres[48728]: [993067-1] LOG:  duration: 
> 0.520 ms  statement: rollback transaction;
> 
> 
> this rollback transaction is worrisome to me...  it is happening once 
> per second.  the disk space used is not growing much, either.
> 
> The slony log on the slave reads like this for this table.  Several 
> other tables were successfully copied before:
> 
> DEBUG2 remoteWorkerThread_2: copy table public.msg_recipients
> DEBUG2 remoteListenThread_2: queue event 2,622 SYNC
> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 563
> DEBUG2 localListenThread: Received event 1,563 SYNC
> DEBUG2 remoteListenThread_2: queue event 2,623 SYNC
> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 564
> DEBUG2 localListenThread: Received event 1,564 SYNC
> DEBUG1 cleanupThread:    0.062 seconds for cleanupEvent()
> DEBUG1 cleanupThread:    0.045 seconds for delete logs
> DEBUG2 cleanupThread:    0.449 seconds for vacuuming
>   [[ elided ]]
> DEBUG1 cleanupThread:    0.006 seconds for cleanupEvent()
> DEBUG1 cleanupThread:    0.069 seconds for delete logs
> DEBUG2 cleanupThread:    0.465 seconds for vacuuming
> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 8212
> DEBUG2 localListenThread: Received event 1,8212 SYNC
> DEBUG2 remoteListenThread_2: queue event 2,8397 SYNC
> DEBUG2 syncThread: new sl_action_seq 1 - SYNC 8213
> DEBUG2 localListenThread: Received event 1,8213 SYNC
> DEBUG2 remoteListenThread_2: queue event 2,8398 SYNC
> 
> which is the tail of the log right now.  The start of the copy above 
> happened at 00:02 last friday!

It started copying, but never actually get's that done. Is there 
anything holding any lock on that table, either on the master or the slave?

Jan

> 
> This table has 165M rows in it: two integers and a varchar(4), a PK 
> comprising both integers, and a regular index on the second integer.
> 
> Replicating the development copy of this same db worked flawlessly 
> (took just a few seconds, since it is tiny).  This is a snapshot of my 
> production db, which is why it is huge.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck at Yahoo.com #