[vox-tech] Crontab oddity - server timeout?

Mon Mar 23 15:22:06 PDT 2009

So I'm using lighttpd and fast_cgi, which occasionally has a problem where
it gets 'stuck'.  (Unable to bring fast_cgi back to life, even though
resources are once again available.)  Usually this results in Error 500s
that never go away until lighttpd is restarted.

So to avoid having to manually go in and resurrect the server, I created
a shell script that tries to hit the site, checks for an HTTP 200 response,
and if it doesn't see that, it does a 'tail' of the access and error logs
(so that I can see what was happening at the time), and then invokes an
"/etc/init.d/lighttpd restart" to kick the server.

I've got the following crontab entry:

*/2 * * * * root THE_SCRIPT

meaning it should run once every 2 minutes, all the time.  I only get an
email when I produces output, and it only does that if it fails to
contact the webserver.

However, when it does fail, I get numerous reports at once.  Could this
be because the server isn't responding immediately when I check the status?

I'm doing that via, in the shell script:

  STATUS=`wget --save-headers http://www.MYSITE.com/ -O - 2> /dev/null | head -1 | cut -d " " -f 2`

In other words, hit the site, save the headers, save them out to stdout,
chop off the "HTTP/1.1" to get the delicious "200" (hopefully) status.

I guess maybe I need to give it a "--timeout" argument, and something
less than 120 seconds, so that the jobs don't run over each other...?

-bill!