[vox-tech] FPE signals, sample code, and why they are bad.

vox-tech@lists.lugod.org vox-tech@lists.lugod.org
Fri, 15 Mar 2002 14:42:45 -0500


--yrj/dFKFPuw6o+aM
Content-Type: text/plain; charset=us-ascii

On Mon, 11 Mar 2002, Peter Jay Salzman wrote:
> can someone post some example code of how to trap a SIGFPE signal and
> abort execution during the course of a C program on linux?

On Wed, Mar 13, 2002 at 11:03:49AM -0800, Peter Jay Salzman wrote:
> i /just/ posted some signal test code before getting this email, 
> and it looks like we're getting the same results;
> SIGFPE just isn't being caught.

  In reality the FPE's aren't being raised because the standards say 
they should not, I found that surprising.  However it makes sense that
the idea of raising unix signals in response to any FP problems must 
predate the current standards by decades.

  When using FPE signals with longjmp it is very difficult to resume a
calculation in response to FPE's, because all local variables changed 
after a setjmp are undefined when longjmp happens _and_ the expression
that caused the problem wouldn't ever complete.

  The current standards provide for results like Inf, -Inf, NaN,
and provide a program with the ability to test which exceptions have
happened after running a long series of calculations.

  There are three programs attached and a Makefile.

#1 - attempts to do a divide by zero, prints an error and exits.
       highlights: sigaction and enabling FPE signals

#2 - attempts to do a series of FPEs (divide by zero, overflow, underflow),
     traps and recovers from the errors, but in this trapping execution
     of each invalid floating point expression is aborted, and variables
     referenced after the jmp landing point must be declared volatile.
       highlights: stuff from #1 and setjmp/longjmp

#3 - does same series of FPEs from #2, but doesn't use signals to 
     detect the problems, instead the results are calculated and a
     warning is printed along with the actual results.
       highlights: this is how it's "supposed" to be done.


  Now the Real Challenge will be figuring how to _use_ the knowledge
that a FPE has happened to do something useful about it at runtime... :)

    Later,
      Mike


Okay the whole story...

  So when I originally read this email I thought it would take a quick
30 mins to whip up some sigaction sample code then mail it off.  However,
I saw FPE's were not being raised for some reason the code was executing
and generating things like Inf, -Inf, and NaN (Infinity and Not-A-Number).

  Well first off I went greping around the header files and stumbled
upon /usr/include/fpu_control.h which shows how one can enable each
individual exception to raise a signal.  There wasn't much documentation
but with only a GET and a SET macro defined it's not hard to use.

  I integrated that into the signal sample code and found it started
raising SIGFPE on divide by zero.  The original sample code would 
only print a message about the type of signal received in the signal 
handler and return.  (Note: doing much of anything in a signal handler 
is generally bad style and depending on what you do very dangerous, for 
FPE it was fine to print because no c library calls are going to cause 
this signal, it's "safe" to do much in this _particular_ handler).

  However, since the instructions that the signal handler returned to 
was doing a FPE, it will happen again and you'll be stuck in an endless 
loop executing the signal handler, so I put an exit in the signal 
handler...

  Then, I went back to vox-tech and noticed Jeff's post 

On Mon, Mar 11, 2002 at 06:13:21PM -0800, Jeff Newmiller wrote:
> I have never tried to make sensible use of SIGFPE in particular...

  I remembered having an short animated discussion about how 
setjmp/longjmp were really very cool, so I decided to provide code
that would trap the divide by zero and print what error happened then 
continue execution to the next type of FPE.  Give me a chance to 
use longjmp for the first time *<:).

  So little while later I was trapping and printing, the first FPE
but I found the next several FPE's were not being raised... this
means you need to re-enable raising of FPE's after each signal goes off.
  Which is really lame.  I think someone decided that clearing the list
of FPE's would be a good idea since it would prevent you from getting
into the endless loop I mentioned above, however if that was the objective
they screwed that up somehow disabled the FPE only *after* some signal 
handler effectively traps and recovers from it so that future FPE's don't
happen... this is certainly worth filing a bug report for to see
if there is an explanation.

  To explain this more clearly if you don't exit or do a longjmp in
the signal handler but you enable FPE signals, once, at the beginning 
of the program, your code will start an endlessly the first time 
a FPE goes off.  However, if you do a longjmp from the signal handler
to recover from the FPE signal, FPE signals are turned off for all 
future code.

  So after re-enabling FPE's I was having trouble finding an expression
that would generate FPE underflow, so a short google search later I found
a document which is 2233 lynx pages long and contains All one could 
possibly Want To Know about libc.  The following URL is from a 
different source that has the document broken into html sections...
I have never tried to make sensible use of SIGFPE in particular...
  http://linux.csua.berkeley.edu/doc/glibc-doc/html/chapters_20.html#SEC411
It was extremely helpful in figuring out how to generate other types
of FPE's... but I learned that fpu_control.h is not the correct way to 
enable signals.  Run "man feraiseexcept" (in section 3, (you'll see 
the standard technically doesn't provide a way for the system to 
automatically raise signals at execute time, GNU libc people provide
that enhancement).
  So I changed the enable signal code to do it the "correct" GNU way.

  However, reading through the document above I discovered that the
standard authors actually seem to have got a very good idea.  They 
provide a method to *check if* a FPE has happened after execution of
a large complex operation, _if_ one actually cares about FPE's...
this is so much cleaner than using longjmp you can't imagine (not 
that longjmp isn't really cool :), it's just not practical to try to
catch and recover from FPE if you are doing a long series of operations
that you just care in the _end_ if something bad happened so that you
can provide the answer and warning.

  So ripping out longjmp from my code I produced a way to "trap FPE's"
without all the magic of a signal handler and jumps.  I discovered
that the compiler is just too darn smart, the compiler was doing most
of the work at compile time so the exceptions were no going off, you'll
see references to "time(0)" in a bunch of places, that was so that
I could trick the compiler to do the expressions at runtime.

--yrj/dFKFPuw6o+aM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=Makefile

TARGET := s1 s2 s3

CC := gcc
CFLAGS := -g -Wall -W -D_GNU_SOURCE -O
CFLAGS := -g -Wall -W -D_GNU_SOURCE -O9
LFLAGS := -lm

all: ${TARGET}

s1: s1.o
	gcc ${CFLAGS} -o $@ $^ ${LFLAGS}
s2: s2.o
	gcc ${CFLAGS} -o $@ $^ ${LFLAGS}
s3: s3.o
	gcc ${CFLAGS} -o $@ $^ ${LFLAGS}

################
# no user servicable parts below here... so don't change below here
################
%.o: %.c
	gcc ${CFLAGS} -c $^
	
.%.d: %.c
	@gcc -MM ${CFLAGS} -o $@ $^

clean:
	rm -rf *.o

nuke: clean
	rm -rf ${DFILES} $(TARGET)

CFILES := ${wildcard *.c}
OFILES := ${patsubst %.c,%.o,${CFILES}}
DFILES := ${patsubst %.c,.%.d,${CFILES}}

ifneq ($(MAKECMDGOALS),nuke)
-include ${DFILES}
endif

--yrj/dFKFPuw6o+aM
Content-Type: text/plain
Content-Disposition: attachment; filename="s1.c"

#include <errno.h>
#include <fenv.h>
#include <float.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

/* by default all FPE's are masked off... "fix that" */
void fpe_trap_enable(void)
{ /* Enable FPE's, by default all FPE's will not raise a signal when 
   * they happen... see fenv.h for magic constants. */
#if 0
  FE_INEXACT           inexact result
  FE_DIVBYZERO         division by zero
  FE_UNDERFLOW         result not representable due to underflow
  FE_OVERFLOW          result not representable due to overflow
  FE_INVALID           invalid operation

  feenableexcept(FE_INEXACT | FE_DIVBYZERO | FE_UNDERFLOW | 
                 FE_OVERFLOW | FE_INVALID);
#endif
  feenableexcept(FE_ALL_EXCEPT);
}

void fpe_print_cause(FILE *file, siginfo_t *info)
{
  if (info->si_signo != SIGFPE)                      /* should never happen */
    fprintf(stderr, "error\n%s:%d %s error: "
      "somehow got a wrong signo = %d\n",
      __FILE__, __LINE__, __FUNCTION__, info->si_signo);
  else
    fprintf(file, 
      "FPE reason %d = \"%s\", from address 0x%X\n",
      info->si_code,
      info->si_code == FPE_INTDIV ? "integer divide by zero"           :
      info->si_code == FPE_INTOVF ? "integer overflow"                 :
      info->si_code == FPE_FLTDIV ? "FP divide by zero"                :
      info->si_code == FPE_FLTOVF ? "FP overflow"                      :
      info->si_code == FPE_FLTUND ? "FP underflow"                     :
      info->si_code == FPE_FLTRES ? "FP inexact result"                :
      info->si_code == FPE_FLTINV ? "FP invalid operation"             :
      info->si_code == FPE_FLTSUB ? "subscript out of range"           :
      "unknown",
      (unsigned int) info->si_addr
    );
}

/* print a message and exit */
void fpe_callback(int sig_number, siginfo_t *info, void *data)
{
  sig_number = sig_number;         /* duplicated information is inside info */
  data = data;                    /* used for SIGIO (see F_SETSIG in fcntl) */

  if (sig_number != SIGFPE)
  {
    fprintf(stderr, "%s:%d %s error: "
            "recieved wrong signal number %d not %d\n",
            __FILE__, __LINE__, __FUNCTION__, sig_number, SIGFPE);
    exit(2);
  }

  fprintf(stderr, "%s:%d %s warn: ", __FILE__, __LINE__, __FUNCTION__);
  fpe_print_cause(stderr, info);

  exit(1);
}

int main(void)
{
  { /* setup a signal handler for SIGFPE */
    struct sigaction action;
  
    memset(&action, 0, sizeof(action));
  
    action.sa_sigaction = fpe_callback;          /* which callback function */
    sigemptyset(&action.sa_mask);                 /* other signals to block */
    action.sa_flags = SA_SIGINFO;               /* give details to callback */
  
    if (sigaction(SIGFPE, &action, 0))
    {
      fprintf(stderr, "%s:%d %s error: "
              "failed to register signal handler %d (%s)\n",
              __FILE__, __LINE__, __FUNCTION__, errno, strerror(errno));
      return 1;
    }
  }

  fpe_trap_enable();

  { /* provide a bunch of examples */
    double a;
    double b;

    /* divide by zero */
    { /* "try" our code */
      a = 1.0; b = time(0) - time(0); a /= b;
  
      fprintf(stderr, "%s:%d %s warn: "
              "FP divide by zero not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }

    /* overflow */
    { /* "try" our code */
      a = DBL_MAX; b = DBL_MAX; a += b;

      fprintf(stderr, "%s:%d %s warn: "
              "FP overflow not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }

    /* underflow */
    { /* "try" our code */
      a = DBL_MIN; b = 2; a /= b;

      fprintf(stderr, "%s:%d %s warn: "
              "FP underflow not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }

    /* inexact result */
    { /* "try" our code */
      a = DBL_MIN; b = DBL_MAX; a /= b;

      fprintf(stderr, "%s:%d %s warn: "
              "FP inexact result not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }
  }

  return 0;
}

--yrj/dFKFPuw6o+aM
Content-Type: text/plain
Content-Disposition: attachment; filename="s2.c"

#include <errno.h>
#include <fenv.h>
#include <float.h>
#include <setjmp.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

/* by default all FPE's are masked off... "fix that" */
void fpe_trap_enable(void)
{ /* Enable FPE's, by default all FPE's will not raise a signal when 
   * they happen... see fenv.h for magic constants. */
#if 0
  FE_INEXACT           inexact result
  FE_DIVBYZERO         division by zero
  FE_UNDERFLOW         result not representable due to underflow
  FE_OVERFLOW          result not representable due to overflow
  FE_INVALID           invalid operation

  feenableexcept(FE_INEXACT | FE_DIVBYZERO | FE_UNDERFLOW | 
                 FE_OVERFLOW | FE_INVALID);
#endif
  feenableexcept(FE_ALL_EXCEPT);
}

void fpe_print_cause(FILE *file, siginfo_t *info)
{
  if (info->si_signo != SIGFPE)                      /* should never happen */
    fprintf(stderr, "error\n%s:%d %s error: "
      "somehow got a wrong signo = %d\n",
      __FILE__, __LINE__, __FUNCTION__, info->si_signo);
  else
    fprintf(file, 
      "FPE reason %d = \"%s\", from address 0x%X\n",
      info->si_code,
      info->si_code == FPE_INTDIV ? "integer divide by zero"           :
      info->si_code == FPE_INTOVF ? "integer overflow"                 :
      info->si_code == FPE_FLTDIV ? "FP divide by zero"                :
      info->si_code == FPE_FLTOVF ? "FP overflow"                      :
      info->si_code == FPE_FLTUND ? "FP underflow"                     :
      info->si_code == FPE_FLTRES ? "FP inexact result"                :
      info->si_code == FPE_FLTINV ? "FP invalid operation"             :
      info->si_code == FPE_FLTSUB ? "subscript out of range"           :
      "unknown",
      (unsigned int) info->si_addr
    );
}

/* save the siginfo information then call longjmp
 *
 * So this Sucks: it appears this crap needs to be redone each time
 * a trap is caught because the mask is reset to mask all FPE's after
 * each raise event, the funny thing is the mask is messed up *after* the
 * longjmp protects us from the problem instruction, unless we longjmp or
 * exit we are caught in an infanite loop.
 *   I suspect someone was trying to be nice and disable traps to prevent 
 * the endless loop but screwed up... :{ */
void fpe_callback(int sig_number, siginfo_t *info, void *data)
{
  extern sigjmp_buf fpe_landing_zone;
  extern siginfo_t fpe_info;

  sig_number = sig_number;         /* duplicated information is inside info */
  data = data;                    /* used for SIGIO (see F_SETSIG in fcntl) */

  memcpy(&fpe_info, info, sizeof(fpe_info)); /* copy to global for refernce */

  fpe_trap_enable();                            /* ARGH: re-enable the trap */

  siglongjmp(fpe_landing_zone, 1);                /* jump out to catch area */
}

int main(void)
{
  extern sigjmp_buf fpe_landing_zone;
  extern siginfo_t fpe_info;

  if (sigsetjmp(fpe_landing_zone, 1))
  { /* this should never happen, but allows us to catch a non-catch jump */
    fprintf(stderr, "%s:%d %s error: "
            "longjmp called without valid target\n",
            __FILE__, __LINE__, __FUNCTION__);
    return 1;
  }

  { /* setup a signal handler for SIGFPE */
    struct sigaction action;
  
    memset(&action, 0, sizeof(action));
  
    action.sa_sigaction = fpe_callback;          /* which callback function */
    sigemptyset(&action.sa_mask);                 /* other signals to block */
    action.sa_flags = SA_SIGINFO;               /* give details to callback */
  
    if (sigaction(SIGFPE, &action, 0))
    {
      fprintf(stderr, "%s:%d %s error: "
              "failed to register signal handler %d (%s)\n",
              __FILE__, __LINE__, __FUNCTION__, errno, strerror(errno));
      return 1;
    }
  }

  fpe_trap_enable();

  { /* provide a bunch of examples */
    volatile double a;
    volatile double b;

    /* divide by zero */
    if (!sigsetjmp(fpe_landing_zone, 1))
    { /* "try" our code */
      a = 1.0; b = time(0) - time(0); a /= b;
  
      fprintf(stderr, "%s:%d %s warn: "
              "FP divide by zero not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }
    else
    { /* longjmp was called, "catch" the failure */
      fprintf(stderr, "%s:%d %s status: "
              "FP divide by zero trapped (%e / %e)\n  ",
              __FILE__, __LINE__, __FUNCTION__, a, b);
      fpe_print_cause(stderr, &fpe_info);
    }

    /* overflow */
    if (!sigsetjmp(fpe_landing_zone, 1))
    { /* "try" our code */
      a = DBL_MAX; b = DBL_MAX; a += b;

      fprintf(stderr, "%s:%d %s warn: "
              "FP overflow not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }
    else
    { /* longjmp was called, "catch" the failure */
      fprintf(stderr, "%s:%d %s status: "
              "FP overflow trapped (%e + %e)\n  ",
              __FILE__, __LINE__, __FUNCTION__, a, b);
      fpe_print_cause(stderr, &fpe_info);
    }

    /* underflow */
    if (!sigsetjmp(fpe_landing_zone, 1))
    { /* "try" our code */
      a = DBL_MIN; b = 2; a /= b;

      fprintf(stderr, "%s:%d %s warn: "
              "FP underflow not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }
    else
    { /* longjmp was called, "catch" the failure */
      fprintf(stderr, "%s:%d %s status: "
              "FP underflow trapped (%e - %e)\n  ",
              __FILE__, __LINE__, __FUNCTION__, a, b);
      fpe_print_cause(stderr, &fpe_info);
    }

    /* inexact result */
    if (!sigsetjmp(fpe_landing_zone, 1))
    { /* we just saved the state, "try" our code */
      a = DBL_MIN; b = DBL_MAX; a /= b;

      fprintf(stderr, "%s:%d %s warn: "
              "FP inexact result not caught result is %e\n",
              __FILE__, __LINE__, __FUNCTION__, a);
    }
    else
    { /* longjmp was called, "catch" the failure */
      fprintf(stderr, "%s:%d %s status: "
              "FP inexact result trapped (%e / %e)\n  ",
              __FILE__, __LINE__, __FUNCTION__, a, b);
      fpe_print_cause(stderr, &fpe_info);
    }

    memset(fpe_landing_zone, 0, sizeof(fpe_landing_zone));
  }

  return 0;
}

/****************************************************************************/
/* boo hiss!  global variables... sadly they are required to handle signals */
/****************************************************************************/

sigjmp_buf fpe_landing_zone;                /* jmp point for SIGFPE handler */
siginfo_t fpe_info;                     /* state storage for SIGFPE handler */

--yrj/dFKFPuw6o+aM
Content-Type: text/plain
Content-Disposition: attachment; filename="s3.c"

#include <errno.h>
#include <float.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#include <fenv.h>

void fpe_print_cause(FILE *file)
{
  char *prefix = "";

  if (fetestexcept(FE_DIVBYZERO))
    fprintf(file, "%sFP divide by zero", prefix),    prefix = ", ";
  if (fetestexcept(FE_INEXACT))
    fprintf(file, "%sFP inexact result", prefix),    prefix = ", ";
  if (fetestexcept(FE_INVALID))
    fprintf(file, "%sFP invalid operation", prefix), prefix = ", ";
  if (fetestexcept(FE_OVERFLOW))
    fprintf(file, "%sFP overflow", prefix),          prefix = ", ";
  if (fetestexcept(FE_UNDERFLOW))
    fprintf(file, "%sFP underflow", prefix),         prefix = ", ";

  fputc('\n', file);
}

int main(void)
{
  feraiseexcept (FE_ALL_EXCEPT);             /* turn on watching for things */

  { /* provide a bunch of examples */
    double a;
    double b;

    /* divide by zero */
    { /* "try" our code */
      feclearexcept(FE_ALL_EXCEPT);
      a = 1.0; b = time(0) - time(0); a /= b;
  
      if (fetestexcept(FE_ALL_EXCEPT))
      {
        fprintf(stderr, "%s:%d %s status: trapped ", 
                __FILE__, __LINE__, __FUNCTION__);
        fpe_print_cause(stderr);
      }
      else
        fprintf(stderr, "%s:%d %s warn: "
                "FP divide by zero not caught\n",
                __FILE__, __LINE__, __FUNCTION__);

      fprintf(stderr, "divide by zero result: %e\n", a);
    }

    /* overflow */
    { /* "try" our code */
      feclearexcept(FE_ALL_EXCEPT);
      a = DBL_MAX; b = time(0); a *= b;
  
      if (fetestexcept(FE_ALL_EXCEPT))
      {
        fprintf(stderr, "%s:%d %s status: trapped ", 
                __FILE__, __LINE__, __FUNCTION__);
        fpe_print_cause(stderr);
      }
      else
        fprintf(stderr, "%s:%d %s warn: "
                "FP overflow not caught\n",
                __FILE__, __LINE__, __FUNCTION__);

      fprintf(stderr, "overflow result: %e\n", a);
    }

    /* underflow */
    { /* "try" our code */
      feclearexcept(FE_ALL_EXCEPT);
      a = DBL_MIN; b = time(0); a /= b;
  
      if (fetestexcept(FE_ALL_EXCEPT))
      {
        fprintf(stderr, "%s:%d %s status: trapped ", 
                __FILE__, __LINE__, __FUNCTION__);
        fpe_print_cause(stderr);
      }
      else
        fprintf(stderr, "%s:%d %s warn: "
                "FP underflow not caught\n",
                __FILE__, __LINE__, __FUNCTION__);

      fprintf(stderr, "underflow result: %e\n", a);
    }

    /* inexact result */
    { /* "try" our code */
      feclearexcept(FE_ALL_EXCEPT);
      a = DBL_MIN; b = time(0); a += b;
  
      if (fetestexcept(FE_ALL_EXCEPT))
      {
        fprintf(stderr, "%s:%d %s status: trapped", 
                __FILE__, __LINE__, __FUNCTION__);
        fpe_print_cause(stderr);
      }
      else
        fprintf(stderr, "%s:%d %s warn: "
                "FP inexact result not caught\n",
                __FILE__, __LINE__, __FUNCTION__);

      fprintf(stderr, "inexact result: %e\n", a);
    }
  }

  return 0;
}

--yrj/dFKFPuw6o+aM--