Bug 289 - Invalid txid_snapshot in sl_event
Summary: Invalid txid_snapshot in sl_event
Status: RESOLVED FIXED
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: 2.0
Hardware: PC Linux
: low critical
Assignee: Jan Wieck
URL:
: 340 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-04-12 04:04 UTC by Tomasz Karlik
Modified: 2014-07-30 12:12 UTC (History)
2 users (show)

See Also:


Attachments
Slony log file (3.04 KB, text/plain)
2013-04-12 04:20 UTC, Tomasz Karlik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomasz Karlik 2013-04-12 04:04:04 UTC
Hi,

Every so often in table sl_event appears invalid ev_snapshot value. Some txid doubles (like 379383437 below), and postgres complains about invalid input for txid_snapshot type:

select * from "pg_catalog".txid_snapshot_xip('379383396:379383451:379383396,379383437,379383437,379383442') ) order by log_actionseq" PGRES_FATAL_ERROR ERROR:  invalid input for txid_snapshot: "379383396:379383451:379383396,379383437,379383437,379383442"
LINE 1: ...d "pg_catalog".txid_visible_in_snapshot(log_txid, '379383396...


The only way to get rid of replication lag is to update sl_event table. For example replacing
"379383396:379383451:379383396,379383437,379383437,379383442"
with
"379383396:379383451:379383396,379383437,379383442"
solves the problem.

Best regards
Comment 1 Tomasz Karlik 2013-04-12 04:20:15 UTC
Created attachment 162 [details]
Slony log file
Comment 2 Jan Wieck 2013-04-18 11:19:11 UTC
What are the exact PostgreSQL and Slony versions?
Comment 3 Tomasz Karlik 2013-04-20 01:53:53 UTC
PostgreSQL 9.2.4
Slony 2.1.3
Comment 4 Jan Wieck 2014-01-29 13:40:13 UTC
I am still hunting this one.

Running a multi-client pgbench for hours and using txid snapshot in-out functions all the time alongside, I was not yet able to create a single occurrence of this problem. Does your application by any chance use subtransactions, like "exceptions" in PL/pgSQL?
Comment 5 Tomasz Karlik 2014-02-04 23:48:09 UTC
(In reply to comment #4)
> I am still hunting this one.
> 
> Running a multi-client pgbench for hours and using txid snapshot in-out
> functions all the time alongside, I was not yet able to create a single
> occurrence of this problem. Does your application by any chance use
> subtransactions, like "exceptions" in PL/pgSQL?

Yes, we use "exceptions" in pl/pgslq. I'll try to check, if this could be the source of problem.

We also use prepared transactions managed by application server. Maybe two-phase commits affects slony operation?
Comment 6 Jan Wieck 2014-04-11 09:00:33 UTC
I don't think that this bug has anything with Slony in particular. The txid_snapshot used by Slony is simply the output of txid_current_snapshot() as created through the txid_snapshot's data type output function. This seems to be a bug in PostgreSQL itself, which I still wasn't able to reproduce.
Comment 7 Jan Wieck 2014-04-12 09:44:58 UTC
A discussion on pgsql-hackers has revealed that this is a bug in PostgreSQL connected to two-phase commit. There is apparently a small window in which two PGPROC entries are visible with the same xid.

I am proposing a patch to PostgreSQL that will remove duplicate xip entries in txid_current_snapshot() and ignore existing duplicates in txid_snapshot_in().
Comment 8 Vikram 2014-07-16 12:12:40 UTC
Curious if this patch has been provided? It would help us in getting rid of the manual process of removing the duplicate entries in sl_event

Thanks

(In reply to comment #7)
> A discussion on pgsql-hackers has revealed that this is a bug in PostgreSQL
> connected to two-phase commit. There is apparently a small window in which two
> PGPROC entries are visible with the same xid.
> 
> I am proposing a patch to PostgreSQL that will remove duplicate xip entries in
> txid_current_snapshot() and ignore existing duplicates in txid_snapshot_in().
Comment 9 Steve Singer 2014-07-16 12:32:32 UTC

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8f9b9590d79fc1fc1ad08b207401acfdbb0bfac7 on head

If you search around you should be able to find the commit on REL9.2 but I would just recommend upgrading to the latest 9.2 minor release (a hear a new PG release is coming next week)
Comment 10 Vikram 2014-07-17 09:58:12 UTC
We recently had this again and we are running postgresql 9.3.4 with slony 2.2.2

(In reply to comment #9)
> 
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8f9b9590d79fc1fc1ad08b207401acfdbb0bfac7
> on head
> 
> If you search around you should be able to find the commit on REL9.2 but I
> would just recommend upgrading to the latest 9.2 minor release (a hear a new PG
> release is coming next week)
Comment 11 Steve Singer 2014-07-17 10:16:42 UTC
Looking at the git logs,

This patch is not included in 9.3.4 I would expect it to be included in 9.3.5
Comment 12 Vikram 2014-07-24 08:48:30 UTC
PostgreSQL 9.3.5 has been released today. I was going through the release notes and it does not mention anything about this fix. Is there a way you can confirm by looking at the git logs if it actually went into this release. Thanks for your help.

(In reply to comment #11)
> Looking at the git logs,
> 
> This patch is not included in 9.3.4 I would expect it to be included in 9.3.5
Comment 13 Steve Singer 2014-07-24 10:11:02 UTC
It is in 9.3.5 according to the git logs

http://git.postgresql.org/gitweb/?p=postgresql.git;a=log;h=refs/tags/REL9_3_5


search for "Handle duplicate XIDs in txid_snapshot."



I am marking this bug as resolved since the fix is now included in a released version of PG
Comment 14 Steve Singer 2014-07-30 12:12:49 UTC
*** Bug 340 has been marked as a duplicate of this bug. ***