Tue Jul 21 16:43:52 PDT 2009
- Previous message: [Slony1-general] Re: Data loss in cleanupEvent()
- Next message: [Slony1-general] How mature is 2.0.2 with Postgres 8.4?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, I did some additional testing and I now have a way of reproducing data loss. It looks like the actual problem is in logswitch_finnish(); but telling cleanupEvent() to leave data around for a longer period prevents it form happening. Follow these steps: 1. Open 2 psql terminals and connect to the master database with both. 2. Make sure there is no log switch in progress. When I did this both sl_log tables were empty. 3. In terminal 1 run: BEGIN; INSERT INTO sometable VALUES (...); 4. In terminal 2 run: SELECT _clustername.logswitch_start(); SELECT _clustername.logswitch_finish(); -- (logswitch_finish() will just sit there waiting) 5. In terminal 1 run: COMMIT; -- (This will also cause logswitch_finish() in terminal 2 to complete) 6. Check both sl_log tables - they will be empty. 7. Check the table on slave node - the new row won't be there. On my test setup I get data loss every time. ----------------------------------------------------------------------- WARNING: I'm just blindly guessing here. Do things even work that way? What I think may be happening : - my transaction starts - logswitch_finnish() is called - there are no visible old rows around which logswitch_finnish() could detect and determine it should not truncate sl_log. My transaction is generating new rows at this time, but they are not visible to logswitch_finnish(). - logswitch_finnish() executes a TRUNCATE statement. This statement just sits there waiting for a lock on sl_log. - my tranaction commits - truncate gets a lock on sl_log and immediately destroys all the rows generated by my transaction. I had cleanup_interval set to 1 minute during testing but it didn't seem to affect the results - transactions lasting only 20 seconds were also lost. The reason why setting cleanup_interval to 6 hours made this problem go away on our production cluster could be that this made some statements from up to 6 hours ago visible to logswitch_finnish() and it knew it shouldn't truncate the log tables. /End blind guessing. ----------------------------------------------------------------------- Does any of this make sense? Regards, Aleksander
- Previous message: [Slony1-general] Re: Data loss in cleanupEvent()
- Next message: [Slony1-general] How mature is 2.0.2 with Postgres 8.4?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list