Mon Apr 4 14:07:28 PDT 2005
- Previous message: [Slony1-general] Execution Plan Problem
- Next message: [Slony1-general] Execution Plan Problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 2005-04-04 at 15:12 +0300, Hannu Krosing wrote: > On T, 2005-03-29 at 11:22 -0500, Christopher Browne wrote: > > Rod Taylor wrote: > > > > > > > >It takes me about 12 to 24 hours to copy the largest tables (that's > > >each, not collectively); so there are a number of blocks where it > > >rescans from transaction ID X to some larger number many times if I use > > >grouping sets less than 7000. > > The real solution would probably be doing the initial copy without > indexes, especially the PK. I did go without indexes for the initial copy but I do keep the PK in place since Slony seems to prefer it. It may be appropriate for Slony itself to remove and re-add the primary key around the copy, but indexes on integer columns don't really get in the way too much (IO was maintained at about 20MB/sec for the copy, which was reasonable for a busy SAN). There are lots of TB+ sized DBs using PostgreSQL out there, and Slony needs to be able to deal with them reasonably. So with respect, the real solution is to clean up Slony to handle an active 200GB table without adding road blocks for itself. Dealing with a variable group size (extend group size to transaction min/max boundaries automatically) and query tuning for multiple sets are 2 of the items I look forward to. Here is a summary of issues I've run into and worked around. All have been posted to this list in pieces, and I understand that 1.1 makes steps to correct some issues: 1. Large transaction (for pg_dump or initial copy) causes pg_log_1 index to become less useful since the main restriction is the transaction ID range that the group covers. As the transactions run longer in the initial copy, the range grows larger. The solution is to ensure these are only scanned a single time by growing the number of groups copied to cover the largest transaction (-g 10 might be normal, but when hitting one of these areas Slony could automatically grow to deal with -g 10000). 2. Subscribing to multiple sets causes PostgreSQL to use origin_id as the only parameter used for an index lookup. Obviously if pg_log_1 has millions of tuples from that origin, this can be an issue. Solution is tuning of the queries for multi-set subscriptions from a single origin. A work around is to manually bump the group size up (to say 10000) to get the pg_log_1 size reduced. 3. Initial copy process can take a while on tables with complex primary keys than those with simple ones. Perhaps Slony could automatically drop and re-add the primary key during the copy process. I'm making the assumption most people already remove their indexes from large structures before starting the copy, but that they leave the primary key because Slony likes it to exist. 4. Slony can easily run out of memory when copying large tuples (say 3MB TOASTed graphics). The memory allocated to a slot is not recovered. On a 32bit intel machine, one needs only have 100 3MB tuples to potentially run out of memory (one tuple to hit each slot either simultaneously or throughout the process) by passing the Linux 32bit process limit. The solution is for Slony to improve memory management to ensure that abnormally large slots are recovered. A workaround is to use 64bit memory management. I've had the Slon process grow to 5GB.
- Previous message: [Slony1-general] Execution Plan Problem
- Next message: [Slony1-general] Execution Plan Problem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list