Log Analysis

4.7. Log Analysis

Here are some of things that you may find in your Slony-I logs, and explanations of what they mean.

4.7.1. CONFIG notices

These entries are pretty straightforward. They are informative messages about your configuration.

Here are some typical entries that you will probably run into in your logs:

CONFIG main: local node id = 1
CONFIG main: loading current cluster configuration
CONFIG storeNode: no_id=3 no_comment='Node 3'
CONFIG storePath: pa_server=5 pa_client=1 pa_conninfo="host=127.0.0.1 dbname=foo user=postgres port=6132" pa_connretry=10
CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3
CONFIG storeSet: set_id=1 set_origin=1 set_comment='Set 1'
CONFIG main: configuration complete - starting threads

4.7.2. INFO notices

Events that take place that seem like they will generally be of interest are recorded at the INFO level, and, just as with CONFIG notices, are always listed.

4.7.3. DEBUG Notices

Debug notices are of less interest, and will quite likely only need to be shown if you are running into some problem with Slony-I.

4.7.4. Thread name

Notices are always prefaced by the name of the thread from which the notice originates. You will see messages from the following threads:

localListenThread

This is the local thread that listens for events on the local node.

remoteWorkerThread-X

The thread processing remote events. You can expect to see one of these for each node that this node communicates with.

remoteListenThread-X

Listens for events on a remote node database. You may expect to see one of these for each node in the cluster.

cleanupThread

Takes care of things like vacuuming, cleaning out the confirm and event tables, and deleting old data.

syncThread

Generates SYNC events.

How much information they display is controlled by the log_level slon parameter; ERROR/WARN/CONFIG/INFO messages will always be displayed, while choosing increasing values from 1 to 4 will lead to additional DEBUG level messages being displayed.

4.7.5. How to read Slony-I logs

Note that as far as slon is concerned, there is no "master" or "slave." They are just nodes.

What you can expect, initially, is to see, on both nodes, some events propagating back and forth. Firstly, there should be some events published to indicate creation of the nodes and paths. If you don't see those, then the nodes aren't properly communicating with one another, and nothing else will happen...

  • Create the two nodes.

    No slons are running yet, so there are no logs to look at.

  • Start the two slons

    The logs for each will start out very quiet, as neither node has much to say, and neither node knows how to talk to another node. Each node will periodically generate a SYNC event, but recognize nothing about what is going on on other nodes.

  • Do the SLONIK STORE PATH to set up communications paths. That will allow the nodes to start to become aware of one another.

    The slon logs should now start to receive events from "foreign" nodes.

    If you look at the contents of the tables sl_node and sl_path and sl_listen, on each node, that should give a good idea as to where things stand. Until the slon starts, each node may only be partly configured. If there are two nodes, there should be two entries in all three of these tables once the communications configuration is set up properly. If there are fewer entries than that, well, that should give you some idea of what is missing.

  • You'll set up the set (SLONIK CREATE SET), add tables (SLONIK SET ADD TABLE), and sequences (SLONIK SET ADD SEQUENCE), and will see relevant events Section 4.7.6.7 only in the logs for the origin node for the set.

  • Then, when you submit the SLONIK SUBSCRIBE SET request, the event should go to both nodes.

    The origin node has little more to do, after that... The subscriber will then have a COPY_SET event, which will lead to logging information about adding each table and copying its data. See Section 4.7.6.4 for more details.

After that, you'll mainly see two sorts of behaviour:

  • On the origin, there won't be too terribly much logged, just indication that some SYNC events are being generated and confirmed by other nodes. See Section 4.7.6.6 to see the sorts of log entries to expect.

  • On the subscriber, there will be reports of SYNC events, and that the subscriber pulls data from the provider for the relevant set(s). This will happen infrequently if there are no updates going to the origin node; it will happen frequently when the origin sees heavy updates.

4.7.6. Log Messages and Implications

This section lists numerous of the error messages found in Slony-I, along with a brief explanation of implications. It is a fairly comprehensive list, only leaving out some of the DEBUG4 messages that are almost always uninteresting.

4.7.6.1. Log Messages Associated with Log Shipping

Most of these represent errors that come up if the Section 4.4 functionality breaks. You may expect things to break if the filesystem being used for log shipping fills, or if permissions on that directory are wrongly set.

  • ERROR: remoteWorkerThread_%d: log archive failed %s - %s\n

    This indicates that an error was encountered trying to write a log shipping file. Normally the slon will retry, and hopefully succeed.

  • DEBUG2: remoteWorkerThread_%d: writing archive log...

    This indicates that a log shipping archive log is being written for a particular SYNC set.

  • INFO: remoteWorkerThread_%d: Run Archive Command %s

    If slon has been configured (-x aka command_on_logarchive) to run a command after generating each log shipping archive log, this reports when that process is spawned using system().

  • ERROR: remoteWorkerThread_%d: Could not open COPY SET archive file %s - %s

    Seems pretty self-explanatory...

  • ERROR: remoteWorkerThread_%d: Could not generate COPY SET archive header %s - %s

    Probably means that we just filled up a filesystem...

  • ERROR remoteWorkerThread_%d: "update "_slony_regress1".sl_archive_counter set ac_num = ac_num + 1, ac_timestamp = CURRENT_TIMESTAMP; select ac_num, ac_timestamp from "_slony_regress1".sl_archive_counter; " PGRES_FATAL_ERROR ERROR: could not serialize access due to concurrent update

    This may occasionally occur when using logshipping; this will typically happen if there are 3 or more nodes, and there is an attempt to concurrently process events sourced from different nodes. This does not represent any serious problem; Slony-I will retry the event which failed without the need for administrative intervention.

4.7.6.2. Log Messages - DDL scripts

The handling of DDL is somewhat fragile, as described in Section 3.2; here are both informational and error messages that may occur in the progress of an SLONIK EXECUTE SCRIPT request.

  • ERROR: remoteWorkerThread_%d: DDL preparation failed - set %d - only on node %

    Something broke when applying a DDL script on one of the nodes. This is quite likely indicates that the node's schema differed from that on the origin; you may need to apply a change manually to the node to allow the event to proceed. The scary, scary alternative might be to delete the offending event, assuming it can't possibly work...

  • SLON_CONFIG: remoteWorkerThread_%d: DDL request with %d statements

    This is informational, indicating how many SQL statements were processed.

  • SLON_ERROR: remoteWorkerThread_%d: DDL had invalid number of statements - %d

    Occurs if there were < 0 statements (which should be impossible) or > MAXSTATEMENTS statements. Probably the script was bad...

  • ERROR: remoteWorkerThread_%d: malloc() failure in DDL_SCRIPT - could not allocate %d bytes of memory

    This should only occur if you submit some extraordinarily large DDL script that makes a slon run out of memory

  • CONFIG: remoteWorkerThread_%d: DDL Statement %d: [%s]

    This lists each DDL statement as it is submitted.

  • ERROR: DDL Statement failed - %s

    Oh, dear, one of the DDL statements that worked on the origin failed on this remote node...

  • CONFIG: DDL Statement success - %s

    All's well...

  • ERROR: remoteWorkerThread_%d: Could not generate DDL archive tracker %s - %s

    Apparently the DDL script couldn't be written to a log shipping file...

  • ERROR: remoteWorkerThread_%d: Could not submit DDL script %s - %s

    Couldn't write the script to a log shipping file.

  • ERROR: remoteWorkerThread_%d: Could not close DDL script %s - %s

    Couldn't close a log shipping file for a DDL script.

  • ERROR: Slony-I ddlScript_prepare(): set % not found

    Set wasn't found on this node; you probably gave the wrong ID number...

  • ERROR: Slony-I ddlScript_prepare_int(): set % not found

    Set wasn't found on this node; you probably gave the wrong ID number...

  • ERROR: Slony-I: alterTableForReplication(): Table with id % not found

    Apparently the table wasn't found; could the schema be messed up?

  • ERROR: Slony-I: alterTableForReplication(): Table % is already in altered state

    Curious... We're trying to set a table up for replication a second time?

  • ERROR: Slony-I: alterTableRestore(): Table with id % not found

    This runs when a table is being restored to "non-replicated" state; apparently the replicated table wasn't found.

  • ERROR: Slony-I: alterTableRestore(): Table % is not in altered state

    Hmm. The table isn't in altered replicated state. That shouldn't be, if replication had been working properly...

  • NOTICE: Slony-I: alterTableForReplication(): multiple instances of trigger % on table %'',

    This normally happens if you have a table that had a trigger attached to it that replication hid due to this being a subscriber node, and then you added a trigger by the same name back to replication. Now, when trying to "fix up" triggers, those two triggers conflict.

    The DDL script will keep running and rerunning, or the UNINSTALL NODE will keep failing, until you drop the "visible" trigger, by hand, much as you must have added it, by hand, earlier.

  • ERROR: Slony-I: Unable to disable triggers

    This is the error that follows the "multiple triggers" problem.

4.7.6.3. Threading Issues

There should not be any "user-serviceable" aspects to the Slony-I threading model; each slon creates a quite well-specified set of helper threads to manage the various database connections that it requires. The only way that anything should break on the threading side is if you have not compiled PostgreSQL libraries to "play well" with threading, in which case you will be unable to compile Slony-I in the first place.

  • FATAL: remoteWorkerThread_%d: pthread_create() - %s

    Couldn't create a new remote worker thread.

  • DEBUG1 remoteWorkerThread_%d: helper thread for provider %d created

    This normally happens when the slon starts: a thread is created for each node to which the local node should be listening for events.

  • DEBUG1: remoteWorkerThread_%d: helper thread for provider %d terminated

    If subscriptions reshape such that a node no longer provides a subscription, then the thread that works on that node can be dropped.

  • DEBUG1: remoteWorkerThread_%d: disconnecting from data provider %d

    A no-longer-used data provider may be dropped; if connection information is changed, the slon needs to disconnect and reconnect.

  • DEBUG2: remoteWorkerThread_%d: ignore new events due to shutdown

    If the slon is shutting down, it is futile to process more events

  • DEBUG1: remoteWorkerThread_%d: node %d - no worker thread

    Curious: we can't wake up the worker thread; there probably should already be one...

4.7.6.4. Log Entries At Subscription Time

Subscription time is quite a special time in Slony-I. If you have a large amount of data to be copied to subscribers, this may take a considerable period of time. Slony-I logs a fairly considerable amount of information about its progress, which is sure to be useful to the gentle reader. In particular, it generates log output every time it starts and finishes copying data for a given table as well as when it completes reindexing the table. That may not make a 28 hour subscription go any faster, but at least helps you have some idea of how it is progressing.

  • DEBUG1: copy_set %d

    This indicates the beginning of copying data for a new subscription.

  • ERROR: remoteWorkerThread_%d: set %d not found in runtime configuration

    slon tried starting up a subscription; it couldn't find conninfo for the data source. Perhaps paths are not properly propagated?

  • ERROR: remoteWorkerThread_%d: node %d has no pa_conninfo

    Apparently the conninfo configuration was wrong...

  • ERROR: copy set %d cannot connect to provider DB node %d

    slon couldn't connect to the provider. Is the conninfo wrong? Or perhaps authentication is misconfigured? Or perhaps the database is down?

  • DEBUG1: remoteWorkerThread_%d: connected to provider DB

    Excellent: the copy set has a connection to its provider

  • ERROR: Slony-I: sequenceSetValue(): sequence % not found

    Curious; the sequence object is missing. Could someone have dropped it from the schema by hand (e.g. - not using SLONIK EXECUTE SCRIPT)?

  • ERROR: Slony-I: subscribeSet() must be called on provider

    This function should only get called on the provider node. slonik normally handles this right, unless one had wrong DSNs in a slonik script...

  • ERROR: Slony-I: subscribeSet(): set % not found

    Hmm. The provider node isn't aware of this set. Wrong parms to a slonik script?

  • ERROR: Slony-I: subscribeSet(): set origin and receiver cannot be identical

    Duh, an origin node can't subscribe to itself.

  • ERROR: Slony-I: subscribeSet(): set provider and receiver cannot be identical

    A receiver must subscribe to a different node...

  • Slony-I: subscribeSet(): provider % is not an active forwarding node for replication set %

    You can only use a live, active, forwarding provider as a data source.

  • Slony-I: subscribeSet_int(): set % is not active, cannot change provider

    You can't change the provider just yet...

  • Slony-I: subscribeSet_int(): set % not found

    This node isn't aware of the set... Perhaps you submitted wrong parms?

  • Slony-I: unsubscribeSet() must be called on receiver

    Seems obvious... This probably indicates a bad slonik admin DSN...

  • Slony-I: Cannot unsubscribe set % while being provider

    This should seem obvious; SLONIK UNSUBSCRIBE SET will fail if a node has dependent subscribers for which it is the provider

  • Slony-I: cleanupEvent(): Single node - deleting events < %

    If there's only one node, the cleanup event will delete old events so that you don't get "build-up of crud."

  • Slony-I: determineIdxnameUnique(): table % not found

    Did you properly copy over the schema to a new node???

  • Slony-I: table % has no primary key

    This likely signifies a bad loading of schema...

  • Slony-I: table % has no unique index %

    This likely signifies a bad loading of schema...

  • WARN: remoteWorkerThread_%d: transactions earlier than XID %s are still in progress

    This indicates that some old transaction is in progress from before the earliest available SYNC on the provider. Slony-I cannot start replicating until that transaction completes. This will repeat until thetransaction completes...

  • DEBUG2: remoteWorkerThread_%d: prepare to copy table %s

    This indicates that slon is beginning preparations to set up subscription for a table.

  • ERROR: remoteWorkerThread_%d: Could not lock table %s on subscriber

    For whatever reason, the table could not be locked, so the subscription needs to be restarted. If the problem was something like a deadlock, retrying may help. If the problem was otherwise, you may need to intervene...

  • DEBUG2: remoteWorkerThread_%d: all tables for set %d found on subscriber

    An informational message indicating that the first pass through the tables found no problems...

  • DEBUG2: remoteWorkerThread_%d: copy sequence %s

    Processing some sequence...

  • DEBUG2: remoteWorkerThread_%d: copy table %s

    slon is starting to copy a table...

  • DEBUG3: remoteWorkerThread_%d: table %s Slony-I serial key added local

    Just added new column to the table to provide surrogate primary key.

  • DEBUG3: remoteWorkerThread_%d: local table %s already has Slony-I serial key

    Did not need to add serial key; apparently it was already there.

  • DEBUG3: remoteWorkerThread_%d: table %s does not require Slony-I serial key

    Apparently this table didn't require a special serial key...

  • DEBUG3: remoteWorkerThread_%d: table %s Slony-I serial key added local

  • DEBUG2: remoteWorkerThread_%d: Begin COPY of table %s

    slon is about to start the COPY on both sides to copy a table...

  • ERROR: remoteWorkerThread_%d: Could not generate copy_set request for %s - %s

    This indicates that the delete/copy requests failed on the subscriber. The slon will repeat the COPY_SET attempt; it will probably continue to fail..

  • ERROR: remoteWorkerThread_%d: copy to stdout on provider - %s %s

    Evidently something about the COPY to stdout on the provider node broke... The event will be retried...

  • ERROR: remoteWorkerThread_%d: copy from stdin on local node - %s %s

    Evidently something about the COPY into the table on the subscriber node broke... The event will be retried...

  • DEBUG2: remoteWorkerThread_%d: %d bytes copied for table %s

    This message indicates that the COPY of the table has completed. This is followed by running ANALYZE and reindexing the table on the subscriber.

  • DEBUG2: remoteWorkerThread_%d: %.3f seconds to copy table %s

    After this message, copying and reindexing and analyzing the table on the subscriber is complete.

  • DEBUG2: remoteWorkerThread_%d: set last_value of sequence %s (%s) to %s

    As should be no surprise, this indicates that a sequence has been processed on the subscriber.

  • DEBUG2: remoteWorkerThread_%d: %.3 seconds to copy sequences

    Summarizing the time spent processing sequences in the COPY_SET event.

  • ERROR: remoteWorkerThread_%d: query %s did not return a result

    This indicates that the query, as part of final processing of COPY_SET, failed. The copy will restart...

  • DEBUG2: remoteWorkerThread_%d: copy_set no previous SYNC found, use enable event

    This takes place if no past SYNC event was found; the current event gets set to the event point of the ENABLE_SUBSCRIPTION event.

  • DEBUG2: remoteWorkerThread_%d: copy_set SYNC found, use event seqno %s

    This takes place if a SYNC event was found; the current event gets set as shown.

  • ERROR: remoteWorkerThread_%d: sl_setsync entry for set %d not found on provider

    SYNC synchronization information was expected to be drawn from an existing subscriber, but wasn't found. Something replication-breakingly-bad has probably happened...

  • DEBUG1: remoteWorkerThread_%d: could not insert to sl_setsync_offline

    Oh, dear. After setting up a subscriber, and getting pretty well everything ready, some writes to a log shipping file failed. Perhaps disk filled up...

  • DEBUG1: remoteWorkerThread_%d: %.3f seconds to build initial setsync status

    Indicates the total time required to get the copy_set event finalized...

  • DEBUG1: remoteWorkerThread_%d: disconnected from provider DB

    At the end of a subscribe set event, the subscriber's slon will disconnect from the provider, clearing out connections...

  • DEBUG1: remoteWorkerThread_%d: copy_set %d done in %.3f seconds

    Indicates the total time required to complete copy_set... This indicates a successful subscription!

4.7.6.5. Log Entries Associated with MERGE SET

These various exceptions cause SLONIK MERGE SET to be rejected; something ought to be corrected before submitting the request again.

  • ERROR: Slony-I: merged set ids cannot be identical

    It is illogical to try to merge a set with itself.

  • ERROR: Slony-I: set % not found

    A missing set cannot be merged.

  • ERROR: Slony-I: set % does not originate on local node

    The SLONIK MERGE SET request must be submitted to the origin node for the sets that are to be merged.

  • ERROR: Slony-I: subscriber lists of set % and % are different

    Sets can only be merged if they have identical subscriber lists.

  • ERROR: Slony-I: set % has subscriptions in progress - cannot merge

    SLONIK MERGE SET cannot proceed until all subscriptions have completed processing. If this message arises, that indicates that the subscriber lists are the same, but that one or more of the nodes has not yet completed setting up its subscription. It may be that waiting a short while will permit resubmitting the SLONIK MERGE SET request.

4.7.6.6. Log Entries Associated With Normal SYNC activity

Some of these messages indicate exceptions, but the "normal" stuff represents what you should expect to see most of the time when replication is just plain working.

  • DEBUG2: remoteWorkerThread_%d: forward confirm %d,%s received by %d

    These events should occur frequently and routinely as nodes report confirmations of the events they receive.

  • DEBUG1: remoteWorkerThread_%d: SYNC %d processing

    This indicates the start of processing of a SYNC

  • ERROR: remoteWorkerThread_%d: No pa_conninfo for data provider %d

    Oh dear, we haven't connection information to connect to the data provider. That shouldn't be possible, normally...

  • ERROR: remoteListenThread_%d: timeout for event selection

    This means that the listener thread (src/slon/remote_listener.c) timed out when trying to determine what events were outstanding for it.

    This could occur because network connections broke, in which case restarting the slon might help.

    Alternatively, this might occur because the slon for this node has been broken for a long time, and there are an enormous number of entries in sl_event on this or other nodes for the node to work through, and it is taking more than slon_conf_remote_listen_timeout seconds to run the query. In older versions of Slony-I, that configuration parameter did not exist; the timeout was fixed at 300 seconds. In newer versions, you might increase that timeout in the slon config file to a larger value so that it can continue to completion. And then investigate why nobody was monitoring things such that replication broke for such a long time...

  • ERROR: remoteWorkerThread_%d: cannot connect to data provider %d on 'dsn'

    Oh dear, we haven't got correct connection information to connect to the data provider.

  • DEBUG1: remoteWorkerThread_%d: connected to data provider %d on 'dsn'

    Excellent; the slon has connected to the provider.

  • WARN: remoteWorkerThread_%d: don't know what ev_seqno node %d confirmed for ev_origin %d

    There's no confirmation information available for this node's provider; need to abort the SYNC and wait a bit in hopes that that information will emerge soon...

  • DEBUG1: remoteWorkerThread_%d: data provider %d only confirmed up to ev_seqno %d for ev_origin %d

    The provider for this node is a subscriber, and apparently that subscriber is a bit behind. The slon will need to wait for the provider to catch up until it has new data.

  • DEBUG2: remoteWorkerThread_%d: data provider %d confirmed up to ev_seqno %s for ev_origin %d - OK

    All's well; the provider should have the data that the subscriber needs...

  • DEBUG2: remoteWorkerThread_%d: syncing set %d with %d table(s) from provider %d

    This is declaring the plans for a SYNC: we have a set with some tables to process.

  • DEBUG2: remoteWorkerThread_%d: ssy_action_list value: %s length: %d

    This portion of the query to collect log data to be applied has been known to "bloat up"; this shows how it has gotten compressed...

  • DEBUG2: remoteWorkerThread_%d: Didn't add OR to provider

    This indicates that there wasn't anything in a "provider" clause in the query to collect log data to be applied, which shouldn't be. Things are quite likely to go bad at this point...

  • DEBUG2: remoteWorkerThread_%d: no sets need syncing for this event

    This will be the case for all SYNC events generated on nodes that are not originating replication sets. You can expect to see these messages reasonably frequently.

  • DEBUG3: remoteWorkerThread_%d: activate helper %d

    We're about to kick off a thread to help process SYNC data...

  • DEBUG4: remoteWorkerThread_%d: waiting for log data

    The thread is waiting to get data to consume (e.g. - apply to the replica).

  • ERROR: remoteWorkerThread_%d: %s %s - qualification was %s

    Apparently an application of replication data to the subscriber failed... This quite likely indicates some sort of serious corruption.

  • ERROR: remoteWorkerThread_%d: replication query did not affect one row (cmdTuples = %s) - query was: %s qualification was: %s

    If SLON_CHECK_CMDTUPLES is set, slon applies changes one tuple at a time, and verifies that each change affects exactly one tuple. Apparently that wasn't the case here, which suggests a corruption of replication. That's a rather bad thing...

  • ERROR: remoteWorkerThread_%d: SYNC aborted

    The SYNC has been aborted. The slon will likely retry this SYNC some time soon. If the SYNC continues to fail, there is some continuing problem, and replication will likely never catch up without intervention. It may be necessary to unsubscribe/resubscribe the affected slave set, or, if there is only one set on the slave node, it may be simpler to drop and recreate the slave node. If application connections may be shifted over to the master during this time, application downtime may not be necessary.

  • DEBUG2: remoteWorkerThread_%d: new sl_rowid_seq value: %s

    This marks the progression of this internal Slony-I sequence.

  • DEBUG2: remoteWorkerThread_%d: SYNC %d done in %.3f seconds

    This indicates the successful completion of a SYNC. Hurray!

  • DEBUG1: remoteWorkerThread_%d_d:%.3f seconds delay for first row

    This indicates how long it took to get the first row from the LOG cursor that pulls in data from the sl_log tables.

  • ERROR: remoteWorkerThread_%d_d: large log_cmddata for actionseq %s not found

    slon could not find the data for one of the "very large" sl_log table tuples that are pulled individually. This shouldn't happen.

  • DEBUG2: remoteWorkerThread_%d_d:%.3f seconds until close cursor

    This indicates how long it took to complete reading data from the LOG cursor that pulls in data from the sl_log tables.

  • DEBUG2: remoteWorkerThread_%d_d: inserts=%d updates=%d deletes=%d

    This reports how much activity was recorded in the current SYNC set.

  • DEBUG3: remoteWorkerThread_%d: compress_actionseq(list,subquery) Action list: %s

    This indicates a portion of the LOG cursor query that is about to be compressed. (In some cases, this could grow to enormous size, blowing up the query parser...)

  • DEBUG3: remoteWorkerThread_%d: compressed actionseq subquery %s

    This indicates what that subquery compressed into.

4.7.6.7. Log Entries - Adding Objects to Sets

These entries will be seen on an origin node's logs at the time you are configuring a replication set; some of them will be seen on subscribers at subscription time.

  • ERROR: Slony-I: setAddTable_int(): table % has no index %

    Apparently a PK index was specified that is absent on this node...

  • ERROR: Slony-I setAddTable_int(): table % not found

    Table wasn't found on this node; did you load the schema in properly?.

  • ERROR: Slony-I setAddTable_int(): table % is not a regular table

    You tried to replicate something that isn't a table; you can't do that!

  • NOTICE: Slony-I setAddTable_int(): table % PK column % nullable

    You tried to replicate a table where one of the columns in the would-be primary key is allowed to be null. All PK columns must be NOT NULL. This request is about to fail.

    A check for this condition was introduced in Slony-I version 1.2. If you have a 1.1 replica, it will continue to function after upgrading to 1.2, but you will experience this complaint when you try to add new subscribers.

    You can look for table/index combinations on an existing node that have NULLABLE columns in the primary key via the following query:

    select c.relname as table_name, ic.relname as index_name, att.attname, att2.attnotnull
    from _cluster.sl_table t, pg_catalog.pg_class c, pg_index i, pg_catalog.pg_class ic, pg_catalog.pg_attribute att, pg_catalog.pg_attribute att2
    where t.tab_reloid = c.oid
        and t.tab_idxname = ic.relname
        and  ic.oid = i.indexrelid
        and att.attrelid = i.indexrelid
        and att2.attname = att.attname
        and att2.attrelid = c.oid
        and att2.attnotnull = 'f';

    These may be rectified via submitting, for each one, a query of the form: alter table mytable alter column nullablecol set not null; Running this against a subscriber where the table is empty will complete very quickly. It will take longer to apply this change to a table that already contains a great deal of data, as the alteration will scan the table to verify that there are no tuples where the column is NULL.

  • ERROR: Slony-I setAddTable_int(): table % not replicable!

    This happens because of the NULLable PK column.

  • ERROR: Slony-I setAddTable_int(): table id % has already been assigned!

    The table ID value needs to be assigned uniquely in SLONIK SET ADD TABLE; apparently you requested a value already in use.

  • ERROR: Slony-I setAddSequence(): set % not found

    Apparently the set you requested is not available...

  • ERROR: Slony-I setAddSequence(): set % has remote origin

    You may only add things at the origin node.

  • ERROR: Slony-I setAddSequence(): cannot add sequence to currently subscribed set %

    Apparently the set you requested has already been subscribed. You cannot add tables/sequences to an already-subscribed set. You will need to create a new set, add the objects to that new set, and set up subscriptions to that.

  • ERROR: Slony-I setAddSequence_int(): set % not found

    Apparently the set you requested is not available...

  • ERROR: Slony-I setAddSequence_int(): sequence % not found

    Apparently the sequence you requested is not available on this node. How did you set up the schemas on the subscribers???

  • ERROR: Slony-I setAddSequence_int(): % is not a sequence

    Seems pretty obvious :-).

  • ERROR: Slony-I setAddSequence_int(): sequence ID % has already been assigned

    Each sequence ID added must be unique; apparently you have reused an ID.

4.7.6.8. Logging When Moving Objects Between Sets

  • ERROR: Slony-I setMoveTable_int(): table % not found

    Table wasn't found on this node; you probably gave the wrong ID number...

  • ERROR: Slony-I setMoveTable_int(): set ids cannot be identical

    Does it make sense to move a table from a set into the very same set?

  • ERROR: Slony-I setMoveTable_int(): set % not found

    Set wasn't found on this node; you probably gave the wrong ID number...

  • ERROR: Slony-I setMoveTable_int(): set % does not originate on local node

    Set wasn't found to have origin on this node; you probably gave the wrong EVENT NODE...

  • ERROR: Slony-I setMoveTable_int(): subscriber lists of set % and % are different

    You can only move objects between sets that have identical subscriber lists.

  • ERROR: Slony-I setMoveSequence_int(): sequence % not found

    Sequence wasn't found on this node; you probably gave the wrong ID number...

  • ERROR: Slony-I setMoveSequence_int(): set ids cannot be identical

    Does it make sense to move a sequence from a set into the very same set?

  • ERROR: Slony-I setMoveSequence_int(): set % not found

    Set wasn't found on this node; you probably gave the wrong ID number...

  • ERROR: Slony-I setMoveSequence_int(): set % does not originate on local node

    Set wasn't found to have origin on this node; you probably gave the wrong EVENT NODE...

  • ERROR: Slony-I setMoveSequence_int(): subscriber lists of set % and % are different

    You can only move objects between sets that have identical subscriber lists.

4.7.6.9. Issues with Dropping Objects

  • ERROR: Slony-I setDropTable(): table % not found

    Table wasn't found on this node; are you sure you had the ID right?

  • ERROR: Slony-I setDropTable(): set % not found

    The replication set wasn't found on this node; are you sure you had the ID right?

  • ERROR: Slony-I setDropTable(): set % has remote origin

    The replication set doesn't originate on this node; you probably need to specify an EVENT NODE in the SLONIK SET DROP TABLE command.

  • ERROR: Slony-I setDropSequence_int(): sequence % not found

    Could this sequence be in another set?

  • ERROR: Slony-I setDropSequence_int(): set % not found

    Could you have gotten the set ID wrong?

  • ERROR: Slony-I setDropSequence_int(): set % has origin at another node - submit this to that node

    This message seems fairly self-explanatory...

4.7.6.10. Issues with MOVE SET, FAILOVER, DROP NODE

Many of these errors will occur if you submit a slonik script that describes a reconfiguration incompatible with your cluster's current configuration. Those will lead to the feeling: "Whew, I'm glad slonik caught that for me!"

Some of the others lead to a slon telling itself to fall over; all should be well when you restart it, as it will read in the revised, newly-correct configuration when it starts up.

Alas, a few indicate that "something bad happened," for which the resolution may not necessarily be easy. Nobody said that replication was easy, alas...

  • ERROR: Slony-I: DROP_NODE cannot initiate on the dropped node

    You need to have an EVENT NODE other than the node that is to be dropped....

  • ERROR: Slony-I: Node % is still configured as a data provider

    You cannot drop a node that is in use as a data provider; you need to reshape subscriptions so no nodes are dependent on it first.

  • ERROR: Slony-I: Node % is still origin of one or more sets

    You can't drop a node if it is the origin for a set! Use SLONIK MOVE SET or SLONIK FAILOVER first.

  • ERROR: Slony-I: cannot failover - node % has no path to the backup node

    You cannot failover to a node that isn't connected to all the subscribers, at least indirectly.

  • ERROR: Slony-I: cannot failover - node % is not subscribed to set %

    You can't failover to a node that doesn't subscribe to all the relevant sets.

  • ERROR: Slony-I: cannot failover - subscription for set % is not active

    If the subscription has been set up, but isn't yet active, that's still no good.

  • ERROR: Slony-I: cannot failover - node % is not a forwarder of set %

    You can only failover or move a set to a node that has forwarding turned on.

  • NOTICE: failedNode: set % has no other direct receivers - move now

    If the backup node is the only direct subscriber, then life is a bit simplified... No need to reshape any subscriptions!

  • NOTICE: failedNode set % has other direct receivers - change providers only

    In this case, all direct subscribers are pointed to the backup node, and the backup node is pointed to receive from another node so it can get caught up.

  • NOTICE: Slony-I: Please drop schema _@CLUSTERNAME@

    A node has been uninstalled; you may need to drop the schema...

4.7.6.11. Log Switching

These messages relate to the new-in-1.2 facility whereby Slony-I periodically switches back and forth between storing data in sl_log_1 and sl_log_2.

  • Slony-I: Logswitch to sl_log_2 initiated'

    Indicates that slon is in the process of switching over to this log table.

  • Slony-I: Logswitch to sl_log_1 initiated'

    Indicates that slon is in the process of switching over to this log table.

  • Previous logswitch still in progress

    An attempt was made to do a log switch while one was in progress...

  • ERROR: remoteWorkerThread_%d: cannot determine current log status

    The attempt to read from sl_log_status, which determines whether we're working on sl_log_1 or sl_log_2 got no results; that can't be a good thing, as there certainly should be data here... Replication is likely about to halt...

  • DEBUG2: remoteWorkerThread_%d: current local log_status is %d

    This indicates which of sl_log_1 and sl_log_2 are being used to store replication data.

4.7.6.12. Miscellanea

Perhaps these messages should be categorized further; that remains a task for the documentors.

  • ERROR: Slonik version: @MODULEVERSION@ != Slony-I version in PG build %

    This is raised in checkmoduleversion() if there is a mismatch between the version of Slony-I as reported by slonik versus what the PostgreSQL build has.

  • ERROR: Slony-I: registry key % is not an int4 value

    Raised in registry_get_int4(), this complains if a requested value turns out to be NULL.

  • ERROR: registry key % is not a text value

    Raised in registry_get_text(), this complains if a requested value turns out to be NULL.

  • ERROR: registry key % is not a timestamp value

    Raised in registry_get_timestamp(), this complains if a requested value turns out to be NULL.

  • NOTICE: Slony-I: cleanup stale sl_nodelock entry for pid=%

    This will occur when a slon starts up after another has crashed; this is routine cleanup.

  • ERROR: Slony-I: This node is already initialized

    This would typically happen if you submit SLONIK STORE NODE against a node that has already been set up with the Slony-I schema.

  • ERROR: Slony-I: node % not found

    An attempt to mark a node not listed locally as enabled should fail...

  • ERROR: Slony-I: node % is already active

    An attempt to mark a node already marked as active as active should fail...

  • DEBUG4: remoteWorkerThread_%d: added active set %d to provider %d

    Indicates that this set is being provided by this provider.

  • DEBUG2: remoteWorkerThread_%d: event %d ignored - unknown origin

    Probably happens if events arrive before the STORE_NODE event that tells that the new node now exists...

  • WARN: remoteWorkerThread_%d: event %d ignored - origin inactive

    This shouldn't occur now (2007) as we don't support the notion of deactivating a node...

  • DEBUG2: remoteWorkerThread_%d: event %d ignored - duplicate

    This might be expected to happen if the event notification comes in concurrently from two sources...

  • DEBUG2: remoteWorkerThread_%d: unknown node %d

    Happens if the slon is unaware of this node; probably a sign of STORE_NODE requests not propagating...

  • insert or update on table "sl_path" violates foreign key constraint "pa_client-no_id-ref". DETAIL: Key (pa_client)=(2) is not present on table "s1_node

    This happens if you try to do SLONIK SUBSCRIBE SET when the node unaware of a would-be new node; probably a sign of STORE_NODE and STORE_PATH requests not propagating...