Jorgen Austvik - Sun Norway Jorgen.Austvik at Sun.COM
Mon May 26 03:26:03 PDT 2008
Hi,

I get problems running Slony-I tests with some automated users on =

Solaris, and have traced the problem down to _check_pid() in =

support_funcs.sh, which tries to parse ps(1) output.

There are just too many ps'es available on Solaris, and they even give =

different output. (So the root problem is really Solaris, but I think I =

have found a way to simplify Slony-I too.)
-----------8<----------------8<----------------8<----------------8<-----
[jaustvik at host:~] /bin/ps ; /usr/bin/ps ; /usr/ucb/ps
    PID TTY         TIME CMD
15048 pts/29      0:00 ps
14690 pts/29      0:00 bash
    PID TTY         TIME CMD
15049 pts/29      0:00 ps
14690 pts/29      0:00 bash
    PID TT       S  TIME COMMAND
  14529          Z  0:00
   5888 pts/2    S  0:00 [ sdt_shell ]
   5889 pts/2    S  0:00 [ bash ]
   5908 pts/2    S  0:00 /bin/ksh /usr/dt/config/Xsession2.jds
   5910 pts/2    S  0:00 /usr/bin/gnome-session
   5933 pts/2    S  0:00 /usr/lib/esd -nobeeps
<snip>
-----------8<----------------8<----------------8<----------------8<-----

Attached is a patch that tries to fix the problem on Solaris in a =

slightly more robust manner.

However, I think it can maybe be done in a better way. _check_pid() is =

called from three places in run_test.sh, in a pattern where a process is =

started, the script waits for a second, and then we check if the process =

is still alive. I think a better way to check this is to use kill:

http://www.opengroup.org/onlinepubs/009695399/functions/kill.html
-----------8<----------------8<----------------8<----------------8<-----
If sig is 0 (the null signal), error checking is performed but no signal =

is actually sent. The null signal can be used to check the validity of pid.
-----------8<----------------8<----------------8<----------------8<-----

Example:
-----------8<----------------8<----------------8<----------------8<-----
[jaustvik at host:/] ps
    PID TTY         TIME CMD
15061 pts/28      0:00 ps
14549 pts/28      0:00 bash
[jaustvik at host:/] kill -0 14549
[jaustvik at host:/] echo $?
0
[jaustvik at host:/] kill -0 1454988
bash: kill: (1454988) - No such process
[jaustvik at host:/] echo $?
1
-----------8<----------------8<----------------8<----------------8<-----

The drawback with this approach is that we do not check that the process =

has the correct name - but I think the risk of a pid being reused in 1 =

second is very little.

I can write and test such a patch on Solaris (x86, SPARC) and Linux (Red =

Hat), it if is of interest.

-J
-- =


J=F8rgen Austvik, Software Engineering - QA
Sun Microsystems Database Technology Group

http://blogs.sun.com/austvik/, http://www.austvik.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: support_functions.sh.patch
Type: text/x-patch
Size: 1297 bytes
Desc: not available
Url : http://lists.slony.info/pipermail/slony1-hackers/attachments/20080526=
/07572212/support_functions.sh.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jorgen_austvik.vcf
Type: text/x-vcard
Size: 390 bytes
Desc: not available
Url : http://lists.slony.info/pipermail/slony1-hackers/attachments/20080526=
/07572212/jorgen_austvik.vcf


More information about the Slony1-hackers mailing list