Locking Issues

4.4. Locking Issues

One of the usual merits of the use, by PostgreSQL, of Multi-Version Concurrency Control (MVCC) is that this eliminates a whole host of reasons to need to lock database objects. On some other database systems, you need to acquire a table lock in order to insert data into the table; that can severely hinder performance. On other systems, read locks can impede writes; with MVCC, PostgreSQL eliminates that whole class of locks in that "old reads" can access "old tuples." Most of the time, this allows the gentle user of PostgreSQL to not need to worry very much about locks. Slony-I configuration events normally grab locks on an internal table, sl_config_lock, which should not be visible to applications unless they are performing actions on Slony-I components.

Unfortunately, there are several sorts of Slony-I events that do require exclusive locks on PostgreSQL tables, with the result that modifying Slony-I configuration can bring back some of those "locking irritations." In particular:

  • set add table

    A momentary exclusive table lock must be acquired on the "origin" node in order to add the trigger that collects updates for that table. It only needs to be acquired long enough to establish the new trigger.

  • move set

    When a set origin is shifted from one node to another, exclusive locks must be acquired on each replicated table on both the old origin and the new origin in order to change the triggers on the tables.

  • lock set

    This operation expressly requests locks on each of the tables in a given replication set on the origin node.

  • During the SUBSCRIBE_SET event on a new subscriber.

    all tables in the replication set will be locked via an exclusive lock for the entire duration of the process of subscription. By locking the tables early, this means that the subscription cannot fail after copying some of the data due to some other process having held on to a table.

    In any case, note that this one began with the wording "on a new subscriber." The locks are applied on the new subscriber. They are not applied on the provider or on the origin.

  • Each time an event is generated (including SYNC events) Slony-I obtains an exclusive lock on the sl_event table long enough to insert the event into sl_event. This is not normally an issue as Slony-I should be the only program using sl_event. However this means that any non-slony transactions that read from sl_event can cause replication to pause. If you pg_dump your database avoid dumping your Slony schemas or else pg_dump's locking will compete with Slony's own locking which could stop Slony replication for the duration of the pg_dump. Exclude the Slony schemas from pg_dump with --exclude-schema=schemaname to specifically exclude your Slony schema.

When Slony-I locks a table that a running application later tries to access the application will be blocked waiting for the lock. It is also possible for a running application to create a deadlock situation with Slony-I when they each have obtained locks that the other is waiting for.

Several possible solutions to this are:

  • Announce an application outage to avoid deadlocks

    If you can temporarily block applications from using the database, that will provide a window of time during which there is nothing running against the database other than administrative processes under your control.

  • Try the operation, hoping for things to work

    Since nothing prevents applications from leaving access locks in your way, you may find yourself deadlocked. But if the number of remaining locks are small, you may be able to negotiate with users to "get in edgewise."

  • Use pgpool

    If you can use this or some similar "connection broker", you may be able to tell the connection manager to stop using the database for a little while, thereby letting it "block" the applications for you. What would be ideal would be for the connection manager to hold up user queries for a little while so that the brief database outage looks, to them, like a period where things were running slowly.

  • Rapid Outage Management

    The following procedure may minimize the period of the outage:

    • Modify pg_hba.conf so that only the slony user will have access to the database.

    • Issue a kill -SIGHUP to the PostgreSQL postmaster.

      This will not kill off existing possibly-long-running queries, but will prevent new ones from coming in. There is an application impact in that incoming queries will be rejected until the end of the process.

    • If "all looks good," then it should be safe to proceed with the Slony-I operation.

    • If some old query is lingering around, you may need to kill -SIGQUIT one of the PostgreSQL processes. This will restart the backend and kill off any lingering queries. You probably need to restart the slon processes that attach to the node.

      At that point, it will be safe to proceed with the Slony-I operation; there will be no competing processes.

    • Reset pg_hba.conf to allow other users in, and kill -SIGHUP the postmaster to make it reload the security configuration.