[vox-tech] FPE signals, sample code, and why they are bad.

Jeff Newmiller vox-tech@lists.lugod.org
Sat, 16 Mar 2002 13:05:37 -0800 (PST)


On Fri, 15 Mar 2002 msimons@moria.simons-clan.com wrote:

> On Mon, 11 Mar 2002, Peter Jay Salzman wrote:
> > can someone post some example code of how to trap a SIGFPE signal and
> > abort execution during the course of a C program on linux?
> 
> On Wed, Mar 13, 2002 at 11:03:49AM -0800, Peter Jay Salzman wrote:
> > i /just/ posted some signal test code before getting this email, 
> > and it looks like we're getting the same results;
> > SIGFPE just isn't being caught.
> 
>   In reality the FPE's aren't being raised because the standards say 
> they should not, I found that surprising.  However it makes sense that
> the idea of raising unix signals in response to any FP problems must 
> predate the current standards by decades.
> 
>   When using FPE signals with longjmp it is very difficult to resume a
> calculation in response to FPE's, because all local variables changed 
> after a setjmp are undefined when longjmp happens _and_ the expression
> that caused the problem wouldn't ever complete.

What puzzles me is... what would you do to fix the problem? and if you
knew you were going to do it, wouldn't you test before performing the
calculation anyway?  If you have an erroneous calculation you want to
handle with FPE, it must be because it was too complicated to wrap with
"if"s, so you might as well throw it out.

>   The current standards provide for results like Inf, -Inf, NaN,
> and provide a program with the ability to test which exceptions have
> happened after running a long series of calculations.

But if underflow or overflow are occuring in Peter's program and yielding
a normal but invalid answer, is the NaN or Inf handling not working
right?

>   There are three programs attached and a Makefile.
> 
> #1 - attempts to do a divide by zero, prints an error and exits.
>        highlights: sigaction and enabling FPE signals

sounds good.

> #2 - attempts to do a series of FPEs (divide by zero, overflow, underflow),
>      traps and recovers from the errors, but in this trapping execution
>      of each invalid floating point expression is aborted, and variables
>      referenced after the jmp landing point must be declared volatile.
>        highlights: stuff from #1 and setjmp/longjmp

See my next post for another approach to handling setjmp.

> 
> #3 - does same series of FPEs from #2, but doesn't use signals to 
>      detect the problems, instead the results are calculated and a
>      warning is printed along with the actual results.
>        highlights: this is how it's "supposed" to be done.

but underflow doesn't generate a propagating NaN.

>   Now the Real Challenge will be figuring how to _use_ the knowledge
> that a FPE has happened to do something useful about it at runtime... :)

Indeed. :)

> 
>     Later,
>       Mike
> 
> 
> Okay the whole story...
> 
>   So when I originally read this email I thought it would take a quick
> 30 mins to whip up some sigaction sample code then mail it off.  However,
> I saw FPE's were not being raised for some reason the code was executing
> and generating things like Inf, -Inf, and NaN (Infinity and Not-A-Number).
> 
>   Well first off I went greping around the header files and stumbled
> upon /usr/include/fpu_control.h which shows how one can enable each
> individual exception to raise a signal.  There wasn't much documentation
> but with only a GET and a SET macro defined it's not hard to use.
> 
>   I integrated that into the signal sample code and found it started
> raising SIGFPE on divide by zero.  The original sample code would 
> only print a message about the type of signal received in the signal 
> handler and return.  (Note: doing much of anything in a signal handler 
> is generally bad style and depending on what you do very dangerous, for 
> FPE it was fine to print because no c library calls are going to cause 
> this signal, it's "safe" to do much in this _particular_ handler).
> 
>   However, since the instructions that the signal handler returned to 
> was doing a FPE, it will happen again and you'll be stuck in an endless 
> loop executing the signal handler, so I put an exit in the signal 
> handler...
> 
>   Then, I went back to vox-tech and noticed Jeff's post 
> 
> On Mon, Mar 11, 2002 at 06:13:21PM -0800, Jeff Newmiller wrote:
> > I have never tried to make sensible use of SIGFPE in particular...
> 
>   I remembered having an short animated discussion about how 
> setjmp/longjmp were really very cool, so I decided to provide code
> that would trap the divide by zero and print what error happened then 
> continue execution to the next type of FPE.  Give me a chance to 
> use longjmp for the first time *<:).
> 
>   So little while later I was trapping and printing, the first FPE
> but I found the next several FPE's were not being raised... this
> means you need to re-enable raising of FPE's after each signal goes off.
>   Which is really lame.  I think someone decided that clearing the list
> of FPE's would be a good idea since it would prevent you from getting
> into the endless loop I mentioned above, however if that was the objective
> they screwed that up somehow disabled the FPE only *after* some signal 
> handler effectively traps and recovers from it so that future FPE's don't
> happen... this is certainly worth filing a bug report for to see
> if there is an explanation.

"man signal" attributes this signal behavior to BSD, and glibc follows the
BSD lead. I too think this is a wart... but is an old one.

> 
>   To explain this more clearly if you don't exit or do a longjmp in
> the signal handler but you enable FPE signals, once, at the beginning 
> of the program, your code will start an endlessly the first time 
> a FPE goes off.  However, if you do a longjmp from the signal handler
> to recover from the FPE signal, FPE signals are turned off for all 
> future code.

I cannot think of a general way for a signal handler to "fix" the values
being used in a computation... so returning from the signal handler
directly just doesn't seem practical.

>   So after re-enabling FPE's I was having trouble finding an expression
> that would generate FPE underflow, so a short google search later I found
> a document which is 2233 lynx pages long and contains All one could 
> possibly Want To Know about libc.  The following URL is from a 
> different source that has the document broken into html sections...
> I have never tried to make sensible use of SIGFPE in particular...
>   http://linux.csua.berkeley.edu/doc/glibc-doc/html/chapters_20.html#SEC411
> It was extremely helpful in figuring out how to generate other types
> of FPE's... but I learned that fpu_control.h is not the correct way to 
> enable signals.  Run "man feraiseexcept" (in section 3, (you'll see 
> the standard technically doesn't provide a way for the system to 
> automatically raise signals at execute time, GNU libc people provide
> that enhancement).
>   So I changed the enable signal code to do it the "correct" GNU way.
> 
>   However, reading through the document above I discovered that the
> standard authors actually seem to have got a very good idea.  They 
> provide a method to *check if* a FPE has happened after execution of
> a large complex operation, _if_ one actually cares about FPE's...
> this is so much cleaner than using longjmp you can't imagine (not 
> that longjmp isn't really cool :), it's just not practical to try to
> catch and recover from FPE if you are doing a long series of operations
> that you just care in the _end_ if something bad happened so that you
> can provide the answer and warning.

Much more usable than signal handling for FPE... this appears to be the
result of the numerical computations workgroup that I mentioned earlier.

>   So ripping out longjmp from my code I produced a way to "trap FPE's"
> without all the magic of a signal handler and jumps.  I discovered
> that the compiler is just too darn smart, the compiler was doing most
> of the work at compile time so the exceptions were no going off, you'll
> see references to "time(0)" in a bunch of places, that was so that
> I could trick the compiler to do the expressions at runtime.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
---------------------------------------------------------------------------