Tue Oct 12 17:57:10 PDT 2004
- Previous message: [Slony1-general] How to change Slave database to be the Primary one?
- Next message: [Slony1-general] Slony stops replicating during nightly periodic + small patch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, First of all, many thanks for the great work on slony! I use slony 1.0.2 to replicate two Postgresql 7.4.3 databases running on FreeBSD 5.2.1-p9, and see that slony stops replicating every night (with a couple minor exceptions) during the periodic process that does the backups, vacuuming, etc. I use the standard 502.pgsql script that comes with the postgresql port on FreeBSD (not quite sure whether it's part of the port or the original source tree of Postgresql), which basically does a pg_dump and a vacuum analyze. Every night, I get this on stdout from slon: ERROR remoteListenThread_1: timeout for event selection And this on stderr: sched_mainloop: select(): Bad file descriptor Setting debug level to 4 does not give much more information, just says after the timeout that the remoteListenThread is done. Trying to figure out the whole scheduling mechanism, I found this little issue: in scheduler.c, a temporary copy of the fdsets for select is made first, and then some checks are done to remove some FDs which may not be needed any more from the global fdsets. I believe this must be an oversight, and is the reason for the select error, which in turn sets sched_status to an error value, and causes sched_msleep to return with an error value and the remote listener thread to stop. I moved the copy further down (just before the select) and last night slony did not stop replicating even though it logged several of the "timeout for event selection" errors. Probably should wait a couple more periodic runs to claim victory, but I believe the patch should at the very least not cause any problems and solve a few, so here it is (including a couple of typo fixes): %diff -u scheduler.c.orig scheduler.c --- scheduler.c.orig Mon Oct 11 17:00:30 2004 +++ scheduler.c Tue Oct 12 18:54:09 2004 @@ -452,21 +452,8 @@ struct timeval timeout; /* - * Make copies of the file descriptor sets for select(2) - */ - FD_ZERO(&rfds); - FD_ZERO(&wfds); - for (i = 0; i < sched_numfd; i++) - { - if (FD_ISSET(i, &sched_fdset_read)) - FD_SET(i, &rfds); - if (FD_ISSET(i, &sched_fdset_write)) - FD_SET(i, &wfds); - } - - /* * Check if any of the connections in the wait queue - * have reached there timeout. While doing so, we also + * have reached their timeout. While doing so, we also * remember the closest timeout in the future. */ tv = NULL; @@ -560,6 +547,19 @@ } /* + * Make copies of the file descriptor sets for select(2) + */ + FD_ZERO(&rfds); + FD_ZERO(&wfds); + for (i = 0; i < sched_numfd; i++) + { + if (FD_ISSET(i, &sched_fdset_read)) + FD_SET(i, &rfds); + if (FD_ISSET(i, &sched_fdset_write)) + FD_SET(i, &wfds); + } + + /* * Do the select(2) while unlocking the master lock. */ pthread_mutex_unlock(&sched_master_lock); @@ -776,7 +776,7 @@ /* ---------- - * sched_add_fdset + * sched_remove_fdset * * Remove a file descriptor from one of the global scheduler sets and * adjust sched_numfd accordingly. Hope that helps, Jacques.
- Previous message: [Slony1-general] How to change Slave database to be the Primary one?
- Next message: [Slony1-general] Slony stops replicating during nightly periodic + small patch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list