Mon Nov 16 05:52:00 PST 2015
- Previous message: [Slony1-general] Network connection from slaves to the master
- Next message: [Slony1-general] remote listener serializability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello slony1 community, I'm part of a team at Akamai working on a notification service based on postgres. (We call it an Alert Management System.) We're at the point where we need to scale past the single instance DB and so have been working with slony1-2.2.4 (and postgresql-9.1.18) to make that happen. Most tests in the past few months have been great, but in recent tests the reassuring SYNC-event-output-per-two-seconds suddenly disappeared. Throughout the day, it returns for a few minutes (normally less than 5, never 10) and then re-enters limbo. Vigorous debugging ensued, and the problem was proven to be the serializable isolation level set in slon/remote_listen.c. Our recent test environment doesn't have a tremendous write rate (measured in KB/s), but it does have 200-400 clients at any one time, which may be a factor. Below is the stack shown in gdb of the postgres server proc (identified via pg_stat_activity) while slon is in limbo. What are the thoughts on possible changes to the remote listener isolation level and their impact? I've tested changes using repeatable read instead, and also with serializable but dropping the deferrable option. The latter offers little improvement if any, but the former seems to return us to healthy replication. In searching around, I found Jan W filed Bug 336 last year (link below) which suggests we could relax the isolation level here and elsewhere. If it was helpful, I could verify an agreed solution and submit it back as a patch. (Not really in the slony community yet, just looking at the process now.) Thanks in advance, http://www.slony.info/bugzilla/show_bug.cgi?id=336 (gdb) thread apply all bt Thread 1 (process 13052): #0 0xffffe430 in __kernel_vsyscall () #1 0xf76d2c0f in semop () from /lib32/libc.so.6 #2 0x08275a26 in PGSemaphoreLock (sema=0xf69d6784, interruptOK=1 '\001') at pg_sema.c:424 #3 0x082b52cb in ProcWaitForSignal () at proc.c:1443 #4 0x082bb57a in GetSafeSnapshot (origSnapshot=<optimized out>) at predicate.c:1520 #5 RegisterSerializableTransaction (snapshot=0x88105a0) at predicate.c:1580 #6 0x083b3f35 in GetTransactionSnapshot () at snapmgr.c:138 #7 0x082c460a in exec_simple_query ( query_string=0xa87d248 "select ev_origin, ev_seqno, ev_timestamp, ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), ev_type, ev_data1,"...) at postgres.c:948 #8 PostgresMain (argc=1, argv=0xa7cd1e0, dbname=0xa7cd1d0 "ams", username=0xa7cd1b8 "ams_slony") at postgres.c:4021 #9 0x08284a58 in BackendRun (port=0xa808118) at postmaster.c:3657 #10 BackendStartup (port=0xa808118) at postmaster.c:3330 #11 ServerLoop () at postmaster.c:1483 #12 0x082854d8 in PostmasterMain (argc=3, argv=0xa7ccb58) at postmaster.c:1144 #13 0x080cb430 in main (argc=3, argv=0xa7ccb58) at main.c:210 (gdb) Tom :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20151116/1908a779/attachment.htm
- Previous message: [Slony1-general] Network connection from slaves to the master
- Next message: [Slony1-general] remote listener serializability
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list