[vox-tech] help with signals and C

Wed, 13 Mar 2002 10:52:17 -0800

begin Jeff Newmiller <jdnewmil@dcn.davis.ca.us> 
> On Mon, 11 Mar 2002, Peter Jay Salzman wrote:
> 
> > hi all,
> > 
> > can someone post some example code of how to trap a SIGFPE signal and
> > abort execution during the course of a C program on linux?
> 
> Is this what you are looking for?
> 
> http://www.csl.mtu.edu/cs4411.ck/www/NOTES/signal/install.html

nice link!

ok, i now have some test code, but it's not working:

   #include <stdio.h>
   #include <signal.h>
   void Exception(int signum);

   int main(void)
   {
      float bignum = 9E350;
      float smallnum = 9E-350;
      signal(SIGFPE, Exception);

      printf("%Lf\n", 2.0L / 0.0L);
      printf("%Lf\n", bignum / smallnum);

      return 0;
   }

   void Exception(int signum)
   {
      printf("Caught SIGFPE: %d.\n", signum);
      exit(-1);
   }

the signal isn't caught at all.  the output is:

  inf
  nan

looking through the man pages, i found that signal returns a type
sighandler_t:

   sighandler_t signal(int signum, sighandler_t handler);

gcc didn't recognize this type at all, so i went to the
header files and saw references to a type __sighandler_t.  gcc liked
that.  is this an error in the man pages or are we supposed to know that
an internal type is preceded with "__"?

anyway, i modified the program:

   #include <stdio.h>
   #include <signal.h>
   void Exception(int signum);

   int main(void)
   {
      __sighandler_t sigreturn;
      float bignum = 9E350;
      float smallnum = 9E-350;

      sigreturn = signal(SIGFPE, Exception);

      printf("%p\n", sigreturn);
      printf("%Lf\n", 2.0L / 0.0L);
      printf("%Lf\n", bignum / smallnum);

      return 0;
   }

   void Exception(int signum)
   {
      printf("Caught SIGFPE: %d.\n", signum);
      exit(-1);
   }

now the output is:

   (nil)
   inf
   nan

i was expecting the first line to be an address (the address of
Exception()).  i take this to mean that the installation of the signal
failed.

my needs are simple.  i just want to catch overflow, underflow and
divide by zero.  here's an example of bad behavior of my real code:

   temperature = 1.0

   trials        Energy per site
   1,000          -2.000000
   10,000         -1.998448
   100,000        -1.997171
   1,000,000      -1.997530
   10,000,000     -1.997312
   100,000,000   -14.163610

here, the floating point error is obvious.  the program homes in on the
real energy per site of a 4x4 quantum spin lattice until i reach
100,000,000 trials.  then one of my variables overflows and the number
becomes absurd.

however, i have other code where this kind of overflow is EXTREMELY
non-obvious.  and i'm working with a domain of parameters where this
sort of thing can easily happen.  i need to know because unless
something becomes nan or inf, i'll have absolutely no way of knowing
that something went wrong.

can anyone tell me why my little signal test code doesn't work?

thanks,
pete