[Slony1-hackers] Idea to increase throughput: Pipelining Index generation

Thu Apr 5 12:12:53 PDT 2007

Here's an approach that could substantially improve data load speed,
for the "COPY_SET" operation, at the (regrettable, and I can't see a
way around it) cost of a somewhat increased possibility of interfering
activities making the load fail...

The general idea is that we know that building indices takes a while,
and we might as well kick off index generation in parallel with
loading later data.

If we do so, this happens via opening multiple concurrent connections.
Let us suppose this is a configurable parameter, which we set to 4.

On the soon-to-be subscriber...

-> loop through all the tables, locking them, and transforming them
    into "replicated mode."

    Close the transaction (alas).

Then, for each table...
-> Grab a connection from the pool (might have up to 4 connections in the pool)
-> Break off the indices
-> COPY data into the table
-> Restore the indices, and submit a request to reindex the table

At that point, the reindex is running asynchronously.  And the loop
continues.  

- If there is a connection free that isn't busy, another COPY can
  occur immediately.

- If not, then a new connection might need to be created (up to 4).

- If all 4 connections are busy, then we need to sleep 10s and retry.

This should largely eliminate indexing time from the process, unless
the table with the most expensive indices is at the end of the
replication set.
-- 
select 'cbbrowne' || '@' || 'ca.afilias.info';
<http://dba2.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)