[vox-tech] Crontab oddity - server timeout?

Brian Lavender brian at brie.com
Thu Mar 26 20:10:56 PDT 2009


On Mon, Mar 23, 2009 at 03:22:06PM -0700, Bill Kendrick wrote:
> 
> So I'm using lighttpd and fast_cgi, which occasionally has a problem where
> it gets 'stuck'.  (Unable to bring fast_cgi back to life, even though
> resources are once again available.)  Usually this results in Error 500s
> that never go away until lighttpd is restarted.
> 
> So to avoid having to manually go in and resurrect the server, I created
> a shell script that tries to hit the site, checks for an HTTP 200 response,
> and if it doesn't see that, it does a 'tail' of the access and error logs
> (so that I can see what was happening at the time), and then invokes an
> "/etc/init.d/lighttpd restart" to kick the server.
> 
> I've got the following crontab entry:
> 
> */2 * * * * root THE_SCRIPT
> 
> meaning it should run once every 2 minutes, all the time.  I only get an
> email when I produces output, and it only does that if it fails to
> contact the webserver.
> 
> However, when it does fail, I get numerous reports at once.  Could this
> be because the server isn't responding immediately when I check the status?
> 
> I'm doing that via, in the shell script:
> 
>   STATUS=`wget --save-headers http://www.MYSITE.com/ -O - 2> /dev/null | head -1 | cut -d " " -f 2`
> 
> In other words, hit the site, save the headers, save them out to stdout,
> chop off the "HTTP/1.1" to get the delicious "200" (hopefully) status.
> 
> 
> I guess maybe I need to give it a "--timeout" argument, and something
> less than 120 seconds, so that the jobs don't run over each other...?

If the server is running, and accepts a connection, but not report back
a 200, then I would imagine it will hang on. Is it accepting a socket
connection, but not reporting back? What if you put a lock file in your
script, so that it exits if another one is already running?

20.9.1 Locking a mailbox file
http://rute.2038bug.com/node23.html.gz#SECTION002390000000000000000

Have you thought about using NAGIOS? It's tricky to configure,
but there is a NAGIOS book that is available through the
http://safari.oreilly.com. I believe it should have an area where you
can configure it to take action if the service is down.

Nagios, 2nd Edition
by Wolfgang Barth
Publisher: No Starch Press
Pub Date: October 28, 2008
Print ISBN-13: 978-1-593-27179-4
Pages: 720

There is also the Linux Networking Cookbook. It has some fast easy
methods for monitoring your httpd service. 

Linux Networking Cookbook
by Carla Schroder
Publisher: O'Reilly Media, Inc.
Pub Date: November 26, 2007
Print ISBN-10: 0-596-10248-8
Print ISBN-13: 978-0-596-10248-7
Pages: 456

It has a NAGIOS section. It is also available through the safari site. I
imagine you might also have some different sources as well. ;-)

Or, you could write your own socket using select. Create you socket file
descriptor and pass it to the following.
http://www.gnu.org/software/hello/manual/libc/Waiting-for-I_002fO.html

     #include <errno.h>
     #include <stdio.h>
     #include <unistd.h>
     #include <sys/types.h>
     #include <sys/time.h>
     
     int
     input_timeout (int filedes, unsigned int seconds)
     {
       fd_set set;
       struct timeval timeout;
     
       /* Initialize the file descriptor set. */
       FD_ZERO (&set);
       FD_SET (filedes, &set);
     
       /* Initialize the timeout data structure. */
       timeout.tv_sec = seconds;
       timeout.tv_usec = 0;
     
       /* select returns 0 if timeout, 1 if input available, -1 if error. */
       return TEMP_FAILURE_RETRY (select (FD_SETSIZE,
                                          &set, NULL, NULL,
                                          &timeout));
     }
     
     int
     main (void)
     {
       fprintf (stderr, "select returned %d.\n",
                input_timeout (STDIN_FILENO, 5));
       return 0;
     }

brian
-- 
Brian Lavender
http://www.brie.com/brian/


More information about the vox-tech mailing list