Slony-I 2.1.4 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Fast Forward | Next |
Chapter 5. Deployment Considerations
- Table of Contents
- 5.1. Cluster Monitoring
- 5.1.1. test_slony_state
- 5.1.2. Nagios Replication Checks
- 5.1.3. Monitoring Slony-I using MRTG
- 5.1.4. Bucardo-related Monitoring
- 5.1.5. search-logs.sh
- 5.1.6. Building MediaWiki Cluster Summary
- 5.1.7. Analysis of a SYNC
- 5.2. Component Monitoring
- 5.2.1. Looking at pg_stat_activity view
- 5.2.2. Looking at sl_components view
- 5.2.3. Notes On Interpreting Component Activity
- 5.3. Partitioning Support
- 5.4. Slony-I Upgrade
- 5.5. Log Analysis
- 5.5.1. CONFIG notices
- 5.5.2. INFO notices
- 5.5.3. DEBUG Notices
- 5.5.4. Thread name
- 5.5.5. How to read Slony-I logs
- 5.5.6. Log Messages and Implications
- 5.6. Performance Considerations
- 5.6.1. Vacuum Concerns
- 5.6.2. Log Switching
- 5.6.3. Long Running Transactions
- 5.7. Security Considerations
5.1. Cluster Monitoring
As a prelude to the discussion, it is worth pointing out that since the bulk of Slony-I functionality is implemented via running database functions and SQL queries against tables within a Slony-I schema, most of the things that one might want to monitor about replication may be found by querying tables in the schema created for the cluster in each database in the cluster.
Here are some of the tables that contain information likely to be particularly interesting from a monitoring and diagnostic perspective.
- sl_status
This view is the first, most obviously useful thing to look at from a monitoring perspective. It looks at the local node's events, and checks to see how quickly they are being confirmed on other nodes.
The view is primarily useful to run against an origin ("master") node, as it is only there where the events generated are generally expected to require interesting work to be done. The events generated on non-origin nodes tend to be SYNC events that require no replication work be done, and that are nearly no-ops, as a result.
- sl_confirm
Contains confirmations of replication events; this may be used to infer which events have, and have not been processed.
- sl_event
Contains information about the replication events processed on the local node.
- sl_log_1 and sl_log_2
These tables contain replicable data. On an origin node, this is the "queue" of data that has not necessarily been replicated everywhere. By examining the table, you may examine the details of what data is replicable.
- sl_node
The list of nodes in the cluster.
- sl_path
This table holds connection information indicating how slon processes are to connect to remote nodes, whether to access events, or to request replication data.
- sl_listen
This configuration table indicates how nodes listen for events coming from other nodes. Usually this is automatically populated; generally you can detect configuration problems by this table being "underpopulated."
- sl_registry
A configuration table that may be used to store miscellaneous runtime data. Presently used only to manage switching between the two log tables.
- sl_seqlog
Contains the "last value" of replicated sequences.
- sl_set
Contains definition information for replication sets, which is the mechanism used to group together related replicable tables and sequences.
- sl_setsync
Contains information about the state of synchronization of each replication set, including transaction snapshot data.
- sl_subscribe
Indicates what subscriptions are in effect for each replication set.
- sl_table
Contains the list of tables being replicated.
5.1.1. test_slony_state
This invaluable script does various sorts of analysis of the state of a Slony-I cluster. Some administrators recommend running these scripts frequently (hourly seems suitable) to find problems as early as possible.
You specify arguments including database, host, user, cluster, password, and port to connect to any of the nodes on a cluster. You also specify a mailprog command (which should be a program equivalent to Unix mailx) and a recipient of email.
You may alternatively specify database connection parameters via the environment variables used by libpq, e.g. - using PGPORT, PGDATABASE, PGUSER, PGSERVICE, and such.
The script then rummages through sl_path to find all of the nodes in the cluster, and the DSNs to allow it to, in turn, connect to each of them.
For each node, the script examines the state of things, including such things as:
Checking sl_listen for some "analytically determinable" problems. It lists paths that are not covered.
Providing a summary of events by origin node
If a node hasn't submitted any events in a while, that likely suggests a problem.
Summarizes the "aging" of table sl_confirm
If one or another of the nodes in the cluster hasn't reported back recently, that tends to lead to cleanups of tables like sl_log_1, sl_log_2 and sl_seqlog not taking place.
Summarizes what transactions have been running for a long time
This only works properly if the statistics collector is configured to collect command strings, as controlled by the option stats_command_string = true in postgresql.conf .
If you have broken applications that hold connections open, this will find them.
If you have broken applications that hold connections open, that has several unsalutory effects as described in the FAQ.
The script does some diagnosis work based on parameters in the script; if you don't like the values, pick your favorites!
Note: Note that there are two versions, one using the "classic" Pg.pm Perl module for accessing PostgreSQL databases, and one, with dbi in its name, that uses the newer Perl
DBI
interface. It is likely going to be easier to find packaging forDBI
.
5.1.2. Nagios Replication Checks
The script in the tools directory called psql_replication_check.pl represents some of the best answers arrived at in attempts to build replication tests to plug into the Nagios system monitoring tool.
A former script, test_slony_replication.pl, took a "clever" approach where a "test script" is periodically run, which rummages through the Slony-I configuration to find origin and subscribers, injects a change, and watches for its propagation through the system. It had two problems:
Connectivity problems to the single host where the test ran would make it look as though replication was destroyed. Overall, this monitoring approach has been fragile to numerous error conditions.
Nagios has no ability to benefit from the "cleverness" of automatically exploring the set of nodes. You need to set up a Nagios monitoring rule for each and every node being monitored.
The new script, psql_replication_check.pl, takes the minimalist approach of assuming that the system is an online system that sees regular "traffic," so that you can define a view specifically for the replication test called replication_status which is expected to see regular updates. The view simply looks for the youngest "transaction" on the node, and lists its timestamp, age, and some bit of application information that might seem useful to see.
In an inventory system, that might be the order number for the most recently processed order.
In a domain registry, that might be the name of the most recently created domain.
An instance of the script will need to be run for each node that is to be monitored; that is the way Nagios works.
5.1.3. Monitoring Slony-I using MRTG
One user reported on the Slony-I mailing list how to configure mrtg - Multi Router Traffic Grapher to monitor Slony-I replication.
... Since I use mrtg to graph data from multiple servers I use snmp (net-snmp to be exact). On database server, I added the following line to snmpd configuration:
exec replicationLagTime /cvs/scripts/snmpReplicationLagTime.sh 2 where /cvs/scripts/snmpReplicationLagTime.sh looks like this:
#!/bin/bash /home/pgdba/work/bin/psql -U pgdba -h 127.0.0.1 -p 5800 -d _DBNAME_ -qAt -c "select cast(extract(epoch from st_lag_time) as int8) FROM _irr.sl_status WHERE st_received = $1"
Then, in mrtg configuration, add this target:
Target[db_replication_lagtime]:extOutput.3&extOutput.3:public at db::30::: MaxBytes[db_replication_lagtime]: 400000000 Title[db_replication_lagtime]: db: replication lag time PageTop[db_replication_lagtime]: <H1>db: replication lag time</H1> Options[db_replication_lagtime]: gauge,nopercent,growright
Alternatively, Ismail Yenigul points out how he managed to monitor slony using MRTG without installing SNMPD.
Here is the mrtg configuration
Target[db_replication_lagtime]:`/bin/snmpReplicationLagTime.sh 2` MaxBytes[db_replication_lagtime]: 400000000 Title[db_replication_lagtime]: db: replication lag time PageTop[db_replication_lagtime]: <H1>db: replication lag time</H1> Options[db_replication_lagtime]: gauge,nopercent,growright
and here is the modified version of the script
# cat /bin/snmpReplicationLagTime.sh #!/bin/bash output=`/usr/bin/psql -U slony -h 192.168.1.1 -d endersysecm -qAt -c "select cast(extract(epoch from st_lag_time) as int8) FROM _mycluster.sl_status WHERE st_received = $1"` echo $output echo $output echo echo # end of script#
Note: MRTG expects four lines from the script, and since there are only two lines provided, the output must be padded to four lines.
5.1.4. Bucardo-related Monitoring
The Bucardo replication system includes script, check_postgres.pl, which can monitor a variety of things about PostgreSQL status that includes monitoring Slony-I replication status; see check_postgres.pl - slony_status
5.1.5. search-logs.sh
This script is constructed to search for Slony-I log files at a given path (LOGHOME), based both on the naming conventions used by the Section 6.1.4 and Section 6.1.1.20 systems used for launching slon processes.
Errors, if found, are listed, by log file, and emailed to the specified user (LOGRECIPIENT); if no email address is specified, output goes to standard output.
LOGTIMESTAMP allows overriding what hour to evaluate (rather than the last hour).
An administrator might run this script once an hour to monitor for replication problems.
5.1.6. Building MediaWiki Cluster Summary
The script mkmediawiki.pl , in tools, may be used to generate a cluster summary compatible with the popular MediaWiki software. Note that the --categories permits the user to specify a set of (comma-delimited) categories with which to associate the output. If you have a series of Slony-I clusters, passing in the option --categories=slony1 leads to the MediaWiki instance generating a category page listing all Slony-I clusters so categorized on the wiki.
The gentle user might use the script as follows:
~/logtail.en> mvs login -d mywiki.example.info -u "Chris Browne" -p `cat ~/.wikipass` -w wiki/index.php Doing login with host: logtail and lang: en ~/logtail.en> perl $SLONYHOME/tools/mkmediawiki.pl --host localhost --database slonyregress1 --cluster slony_regress1 --categories=Slony-I > Slony_replication.wiki ~/logtail.en> mvs commit -m "More sophisticated generated Slony-I cluster docs" Slony_replication.wiki Doing commit Slony_replication.wiki with host: logtail and lang: en
Note that mvs is a client written in Perl; on Debian GNU/Linux, the relevant package is called libwww-mediawiki-client-perl; other systems may have a packaged version of this under some similar name.
5.1.7. Analysis of a SYNC
The following is (as of 2.0) an extract from the slon log for node #2 in a run of "test1" from the regression tests.
DEBUG2 remoteWorkerThread_1: SYNC 19 processing INFO about to monitor_subscriber_query - pulling big actionid list 134885072 INFO remoteWorkerThread_1: syncing set 1 with 4 table(s) from provider 1 DEBUG2 ssy_action_list length: 0 DEBUG2 remoteWorkerThread_1: current local log_status is 0 DEBUG2 remoteWorkerThread_1_1: current remote log_status = 0 DEBUG1 remoteHelperThread_1_1: 0.028 seconds delay for first row DEBUG1 remoteHelperThread_1_1: 0.978 seconds until close cursor INFO remoteHelperThread_1_1: inserts=144 updates=1084 deletes=0 INFO remoteWorkerThread_1: sync_helper timing: pqexec (s/count)- provider 0.063/6 - subscriber 0.000/6 INFO remoteWorkerThread_1: sync_helper timing: large tuples 0.315/288 DEBUG2 remoteWorkerThread_1: cleanup INFO remoteWorkerThread_1: SYNC 19 done in 1.272 seconds INFO remoteWorkerThread_1: SYNC 19 sync_event timing: pqexec (s/count)- provider 0.001/1 - subscriber 0.004/1 - IUD 0.972/248
Here are some notes to interpret this output:
Note the line that indicates
inserts=144 updates=1084 deletes=0
This indicates how many tuples were affected by this particular SYNC.
Note the line indicating
0.028 seconds delay for first row
This indicates the time it took for the
LOG cursor
to get to the point of processing the first row of data. Normally, this takes a long time if the SYNC is a large one, and one requiring sorting of a sizable result set.
Note the line indicating
0.978 seconds until close cursor
This indicates how long processing took against the provider.
sync_helper timing: large tuples 0.315/288
This breaks off, as a separate item, the number of large tuples (e.g. - where size exceeded the configuration parameter sync_max_rowsize) and where the tuples had to be processed individually.
SYNC 19 done in 1.272 seconds
This indicates that it took 1.272 seconds, in total, to process this set of SYNCs.
SYNC 19 sync_event timing: pqexec (s/count)- provider 0.001/1 - subscriber 0.004/0 - IUD 0.972/248
This records information about how many queries were issued against providers and subscribers in function
sync_event()
, and how long they took.Note that 248 does not match against the numbers of inserts, updates, and deletes, described earlier, as I/U/D requests are clustered into groups of queries that are submitted via a single
pqexec()
call on the subscriber.sync_helper timing: pqexec (s/count)- provider 0.063/6 - subscriber 0.000/6
This records information about how many queries were issued against providers and subscribers in function
sync_helper()
, and how long they took.