[Slony1-commit] By cbbrowne: Added some comments about how adding lots of sequences gets

Mon Feb 7 21:44:53 PST 2005

Log Message:
-----------
Added some comments about how adding lots of sequences gets pathological

Modified Files:
--------------
    slony1-engine/doc/adminguide:
        defineset.sgml (r1.8 -> r1.9)

-------------- next part --------------
Index: defineset.sgml
===================================================================
RCS file: /usr/local/cvsroot/slony1/slony1-engine/doc/adminguide/defineset.sgml,v
retrieving revision 1.8
retrieving revision 1.9
diff -Ldoc/adminguide/defineset.sgml -Ldoc/adminguide/defineset.sgml -u -w -r1.8 -r1.9

--- doc/adminguide/defineset.sgml
+++ doc/adminguide/defineset.sgml
@@ -20,18 +20,20 @@
 
 <sect2><title>Primary Keys</title>
 
-<para><productname>Slony-I</productname> <emphasis>needs</emphasis> to have a
-primary key or candidate thereof on each table that is replicated.  PK
-values are used as the primary identifier for each tuple that is
-modified in the source system.  There are three ways that you can get
-<productname>Slony-I</productname> to use a primary key:</para>
+<para><productname>Slony-I</productname> <emphasis>needs</emphasis> to
+have a primary key or candidate thereof on each table that is
+replicated.  PK values are used as the primary identifier for each
+tuple that is modified in the source system.  There are three ways
+that you can get <productname>Slony-I</productname> to use a primary
+key:</para>
 
 <itemizedlist>
 
-<listitem><para> If the table has a formally identified primary key, <command><link linkend="stmtsetaddtable">SET ADD
+<listitem><para> If the table has a formally identified primary key,
+<command><link linkend="stmtsetaddtable">SET ADD
 TABLE</link></command> can be used without any need to reference the
-primary key.  <productname>Slony-I</productname> will pick up that there is a
-primary key, and use it.</para></listitem>
+primary key.  <productname>Slony-I</productname> will pick up that
+there is a primary key, and use it.</para></listitem>
 
 <listitem><para> If the table hasn't got a primary key, but has some
 <emphasis>candidate</emphasis> primary key, that is, some index on a
@@ -70,7 +72,8 @@
 a new failure mode for your application, and this implies that you had
 a way to enter confusing data into the database.</para>
 </sect2>
-<sect2><title>Grouping tables into sets</title>
+
+<sect2 id="definesets"><title>Grouping tables into sets</title>
 
 <para> It will be vital to group tables together into a single set if
 those tables are related via foreign key constraints.  If tables that
@@ -80,14 +83,83 @@
 <quote>master</quote> can't be updated properly because it is missing
 the contents of dependent tables.</para>
 
-<para> If a database schema has been designed cleanly, it is likely
-that replication sets will be virtually synonymous with namespaces.
-All of the tables and sequences in a particular namespace will be
-sufficiently related that you will want to replicate them all.
-Conversely, tables found in different schemas will likely
-<emphasis>not</emphasis> be related, and therefore should be replicated in
-separate sets.</para>
+<para> There are also several reasons why you might
+<emphasis>not</emphasis> want to have all of the tables in one
+replication set:
+
+<itemizedlist>
+
+<listitem><para> Replicating a large set leads to a <link
+linkend="longtxnsareevil"> long running transaction </link> on the
+provider node.  The FAQ outlines a number of problems that result from
+long running transactions that will injure system performance.</para>
+
+<para> If you can split a large set into several pieces, that will
+shorten the length of each of the transactions, lessening the degree
+of <quote>injury</quote> to performance.</para></listitem>
+
+<listitem><para> Any time you invoke <link linkend="stmtddlscript">
+<command> EXECUTE SCRIPT </command></link>, this requests a lock on
+<emphasis> every single table in the replication set. </emphasis></para>
+
+<para> There have been reports <quote>in the field</quote> of this
+leading to deadlocks such that the <link linkend="stmtddlscript">
+<command> EXECUTE SCRIPT </command></link> request had to be submitted
+many times in order for it to actually complete successfully.</para>
+
+<para> The more tables you have in a set, the more tables need to be
+locked, and the greater the chances of deadlocks. </para>
+
+<para> By the same token, if a particular DDL script only needs to
+affect a couple of tables, you might use <link
+linkend="stmtsetmovetable"> <command>SET MOVE TABLE</command></link>
+to move them temporarily to a new replication set.  By diminishing the
+number of locks needed, this should ease the ability to get the DDL
+change into place.</para>
+</listitem>
+
+</itemizedlist>
+
 </sect2>
+
+<sect2> <title> The Pathology of Sequences </title>
+
+<para> Each time a SYNC is processed, values are recorded for
+<emphasis>all</emphasis> of the sequences in the set.  If there are a
+lot of sequences, this can cause <envar>sl_seqlog</envar> to grow
+rather large.
+
+<para> This points to an important difference between tables and
+sequences: if you add additional tables that do not see much/any
+activity, this does not add any material load to the work being done
+by replication.  For a replicated sequence, values must
+<emphasis>regularly</emphasis> be propagated to subscribers.  Consider
+the effects:
+
+<itemizedlist>
+<listitem><para> A replicated table that is never updated does not introduce much work to the system.
+
+<para> If it is not updated, the trigger on the table on the origin
+never fires, and no entries are added to <envar>sl_log_1</envar>.  The
+table never appears in any of the further replication queries
+(<emphasis>e.g.</emphasis> in the <command>FETCH 100 FROM
+LOG</command> queries used to find replicatable data) as they only
+look for tables for which there are entries in
+<envar>sl_log_1</envar>.
+
+<listitem><para> In contrast, a fixed amount of work is introduced to
+each SYNC by each sequence that is replicated.
+
+<para> Replicate 300 sequence and 300 rows need to be added to
+<envar>sl_seqlog</envar> on a regular basis.
+
+<para> It is more than likely that if the value of a particular
+sequence hasn't changed since it was last checked, perhaps the same
+value need not be stored over and over; some thought needs to go into
+how to do that safely.
+
+</itemizedlist>
+
 </sect1>
 
 <!-- Keep this comment at the end of the file