[Slony1-general] Logship files printing incorrectly

Tue May 8 12:02:49 PDT 2012

On 12-05-07 07:33 PM, Richard Yen wrote:
> On Mon, May 7, 2012 at 12:22 PM, Steve Singer <ssinger at ca.afilias.info
> <mailto:ssinger at ca.afilias.info>> wrote:
>
>
>     The archive sequence numbers are supposed to be assigned on the node
>     the slon is for (node 3 is this case).  So two remote_worker threads
>     inside of slon 3 SHOULDN'T ever get the same archive counter number
>     (we should be serializing on an update of the archive_counter
>     table), thus they shouldn't be writing to the same file.   One
>     theory is that some of the "shouldnt's" are actually happening (for
>     reasons we haven't determined)
>
>
> I notice in my logs that I see very frequent occurrences of these (maybe
> once per minute):
>
> 16:28]myhost:/home/richyen# head /var/log/local0
> Apr 29 20:13:31 myhost.example.com <http://myhost.example.com>
> postgres[21243]: [4876-1] 2012-04-29 20:13:31.873 PDT [user=###,db=###
> localhost(57453) PID:21243 XID:1971641858]ERROR:  could not serialize
> access due to concurrent update
> Apr 29 20:13:31 myhost.example.com
> <http://myhost.example.com> postgres[21243]: [4876-2] 2012-04-29
> 20:13:31.873 PDT [user=###,db=### localhost(57453) PID:21243
> XID:1971641858]STATEMENT:  update "_slony".sl_archive_counter     set
> ac_num = ac_num + 1,         ac_timestamp = CURRENT_TIMESTAMP; select
> ac_num, ac_timestamp from "_slony".sl_archive_counter;
> Apr 29 20:13:31 myhost.example.com
> <http://myhost.example.com> postgres[21243]: [4877-1] 2012-04-29
> 20:13:31.873 PDT [user=###,db=### localhost(57453) PID:21243
> XID:1971641858]ERROR:  current transaction is aborted, commands ignored
> until end of transaction block
>

When two remote_worker threads try to get a sequence number at the same 
time they will hit this.  We are depending on the postgresql aborting 
one of the transactions , as you see above, to ensure two threads don't 
get the same id number for the filename. So the logs you pasted don't 
indicate a problem (or the source of your issue).

Does anyone (Chris, Jan?) remember why we just didn't use a sequence on 
the master for this purpose?

  Apr 29 20:13:31 myhost.example.com
> <http://myhost.example.com> postgres[21243]: [4877-2] 2012-04-29
> 20:13:31.873 PDT [user=###,db=### localhost(57453) PID:21243
> XID:1971641858]STATEMENT:  notify "_slony_Restart";
>
> Could this be contributing to the issue?
> --Richard