Topic: Unexpected FP div-by-zero with /64 in 64-bit

mecej4

Posts: 1911

Back to Top

6 Mar 2021 7:15 #27216

This short program runs when it is compiled with FTN95 8.70 as a 32-bit EXE, and prints -5.000000E+07.

When the source is then compiled with /64 and run, the run is aborted after reporting 'floating point division by zero'.

program Utanh
implicit none
real :: disp = 0.0125, vel =-2.5e-10, wetab, z, mytanh
!
mytanh(z) = (exp(z)-exp(-z))/(exp(z)+exp(-z))
z = vel/disp
!print *,z
wetab = 1.0/mytanh(z)-1.0/z
print *,wetab
end program

Is the difference attributable to the range of values that the intrinsic EXP() is capable of in each case?

JohnCampbell

Posts: 2526 Sydney

Back to Top

7 Mar 2021 4:36 #27219

Ver 8.62 produces an 'Access violation' for /64 /lgo Ver 8.62 produces an 'Floating point division by zero' for /64 /debug /lgo

Gfortran 10.2 produces : GCC version 10.2.0 Infinity

mecej4

Posts: 1911

Back to Top

7 Mar 2021 9:01 #27222

Thanks, John.

The test program is an offshoot of an attempt to understand some odd behavior (in the Hydrus6 software from USGS). Really old Fortran did not have hyperbolic functions as intrinsic functions, so the authors used an ASF to implement it, and large parts of these old codes are in single precision.

Here is a different attempt on the issue (approximate 1/tanh(x) - 1/x for small x).

program mytanh
real x,y,z
integer i
x = 1.0
do i=1,15
   y = 1.0/x - 1.0/tanh(x) !error in 1st approx
   z = y + x/3.0           !error in 2nd approx
   print '(i3,3ES12.4)',i,x,y,z
   x=0.2*x
end do
end program mytanh

Results from Gfortran (may vary with version, compile options):

  1  1.0000E+00 -3.1304E-01  2.0298E-02
  2  2.0000E-01 -6.6489E-02  1.7745E-04
  3  4.0000E-02 -1.3332E-02  9.6764E-07
  4  8.0000E-03 -2.6627E-03  4.0082E-06
  5  1.6000E-03 -4.8828E-04  4.5052E-05
  6  3.2000E-04 -2.4414E-04 -1.3747E-04
  7  6.4000E-05  0.0000E+00  2.1333E-05
  8  1.2800E-05  0.0000E+00  4.2667E-06
  9  2.5600E-06  0.0000E+00  8.5333E-07
 10  5.1200E-07  0.0000E+00  1.7067E-07
 11  1.0240E-07  1.0000E+00  1.0000E+00
 12  2.0480E-08  0.0000E+00  6.8267E-09
 13  4.0960E-09  0.0000E+00  1.3653E-09
 14  8.1920E-10  0.0000E+00  2.7307E-10
 15  1.6384E-10  0.0000E+00  5.4613E-11

Line-11 has rather surprising results. On this line x became < eps_mach.

JohnCampbell

Posts: 2526 Sydney

Back to Top

7 Mar 2021 10:17 #27225

Your results are unusual, but I can not reproduce your results. I changed the code slightly to program mytanh !$ use iso_fortran_env real x,y,z integer i !$ write (,) compiler_version() x = 1.0 do i=1,15 y = 1.0/x - 1.0/tanh(x) !error in 1st approx z = y + x/3.0 !error in 2nd approx print '(i3,3ES12.4)',i,x,y,z x=0.2*x end do end program mytanh

I compiled with: gfortran mytanh.f90 -o mytanh.exe -ffast-math -fopenmp

resulting in GCC version 10.2.0 1 1.0000E+00 -3.1304E-01 2.0298E-02 2 2.0000E-01 -6.6489E-02 1.7745E-04 3 4.0000E-02 -1.3332E-02 9.6764E-07 4 8.0000E-03 -2.6627E-03 4.0082E-06 5 1.6000E-03 -4.8828E-04 4.5052E-05 6 3.2000E-04 0.0000E+00 1.0667E-04 7 6.4000E-05 0.0000E+00 2.1333E-05 8 1.2800E-05 0.0000E+00 4.2667E-06 9 2.5600E-06 0.0000E+00 8.5333E-07 10 5.1200E-07 0.0000E+00 1.7067E-07 11 1.0240E-07 0.0000E+00 3.4133E-08 12 2.0480E-08 0.0000E+00 6.8267E-09 13 4.0960E-09 0.0000E+00 1.3653E-09 14 8.1920E-10 0.0000E+00 2.7307E-10 15 1.6384E-10 0.0000E+00 5.4613E-11

I also printed out 't = tanh(x)' and tried a variety of alternative compile options.

I get similar results on 5900x and i5-2300 (with gf 9.2.0) line 6 is also different

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

7 Mar 2021 10:35 #27226

This is just a general comment...

In the first program z is small and tanh(z) is approximately z. So wetab is the difference of two large values that are almost equal. When the calculation is successful it will be highly suseptible to round-off error.

32 bit extended FTN95 precision real(3) using the intrinsic tanh may give the best result. 32 bit double FTN95 precision real(2) might use extended precision internally for part of the calculation.

Single precision is not an option here and double precision probably gives a very poor result.

mecej4

Posts: 1911

Back to Top

7 Mar 2021 10:55 #27227

Thanks again, John.

I obtained the peculiar line-11 with Gfortran 9.3-64-bit on Cygwin, 4.8.5-64-bit as well as 6.2.0-64-bit on WSL1/Ubuntu14.04.1, and 7.1.1 on a Linux cloud service.

I'll have to install a newer 10.x version and see.

JohnCampbell

Posts: 2526 Sydney

Back to Top

7 Mar 2021 11:35 #27228

mecej4,

I only use a pre-built gFortran for windows from equation.com, which is what clearwin_64 uses. Don't have 9.3, 4.8.5 or 7.1.1 but do have 6.2.0 which produces on i5-2300

 GCC version 6.2.0
  1  1.0000E+00 -3.1304E-01  2.0298E-02
  2  2.0000E-01 -6.6489E-02  1.7745E-04
  3  4.0000E-02 -1.3332E-02  9.6764E-07
  4  8.0000E-03 -2.6627E-03  4.0082E-06
  5  1.6000E-03 -4.8828E-04  4.5052E-05
  6  3.2000E-04  0.0000E+00  1.0667E-04
  7  6.4000E-05  0.0000E+00  2.1333E-05
  8  1.2800E-05  0.0000E+00  4.2667E-06
  9  2.5600E-06  0.0000E+00  8.5333E-07
 10  5.1200E-07  0.0000E+00  1.7067E-07
 11  1.0240E-07  0.0000E+00  3.4133E-08
 12  2.0480E-08  0.0000E+00  6.8267E-09
 13  4.0960E-09  0.0000E+00  1.3653E-09
 14  8.1920E-10  0.0000E+00  2.7307E-10
 15  1.6384E-10  0.0000E+00  5.4613E-11

looks to be identical also tried mingw-w64 6.2.0 with same results, although mingw-w64 appears to have limited support. A shame 9.3.0 is not available for windows (or ZEN3 support)

It is surprising you can get similar results on a variety of versions, but I get my similar results. Could it be different hardware instructions being selected ? It does look to be different round-off which I thought IEEE standardised ? I tested on i5-2300 and Ryzen 5900X which do use different instructions. -ffast-math did not appear to modify my results.

mecej4

Posts: 1911

Back to Top

7 Mar 2021 2:04 (Edited: 8 Mar 2021 12:13) #27230

John, there IS something strange happening here, but it seems to affect only Gfortran, and probably depends on how various versions of Gfortran react to the precision loss that Paul commented upon.

On my Dell XPS-17 laptop with an i7-2720QM, running W10-64, Gfortran 9.3 from Cygwin gives for y the strange value 1.0000E+00 on Line-11, but Gfortran 9.2 from Equation.com gives 0.0000E+00. I even asked the two compilers to generate an assembler file. The two assembler files were identical except for the lines containing

    .ascii \'GCC version 9.3.0\'

in one and

   .ascii \'GCC version 9.2.0\'

in the other. Therefore, the differences must be caused by the Gfortran RTLs and how those differ in the two distributions of GCC.

JohnCampbell

Posts: 2526 Sydney

Back to Top

8 Mar 2021 12:05 #27242

Could you include: print '(i3,4ES12.4)',i,x,y,z,tanh(x) or t = tanh(x) print '(i3,4ES12.4)',i,x,y,z,t

It is a strange value of x to produce a different result !

mecej4

Posts: 1911

Back to Top

8 Mar 2021 1:01 #27243

John, I did what you asked. Starting from line-5, the values of tanh(x) were the same as that of x to the five significant digits printed. Here are lines 10-12, as produced by Gfortran 9.3.0 (Cygwin):

 10  5.1200E-07  0.0000E+00  1.7067E-07  5.1200E-07
 11  1.0240E-07  1.0000E+00  1.0000E+00  1.0240E-07
 12  2.0480E-08  0.0000E+00  6.8267E-09  2.0480E-08

JohnCampbell

Posts: 2526 Sydney

Back to Top

8 Mar 2021 3:46 #27245

y = 1/x - 1/tanh = (1-x/tanh)/x

So x/tanh is not 1.0 for 1.024e-7 but is for 5.12e-7 and 2.048e-8 ?

tanh - x must be non-zero in your compiler, which implies an error in the TANH function interpolation. See if the following identifies the non-zero 'tanh - x' (due to rounding ?)

program mytanh_test
!$ use iso_fortran_env
real*4 x,y,z,t
real*4 :: one = 1, three = 3, five = 5
integer i

!$ write (*,*) compiler_version()
write (*,*) ' i       X           Y           Z         tanh-x'
x = one
do i=1,15 
   y = one/x - one/tanh(x)   !error in 1st approx
   z = y + x/three           !error in 2nd approx
   t = tanh(x)
   print '(i3,6ES12.4)',i,x,y,z, t-x 
   x = x/five
end do
end program mytanh_test

mecej4

Posts: 1911

Back to Top

8 Mar 2021 9:54 #27249

John, even with those versions of Gfortran that gave the two 1.0000 s on Line-11, I find that replacing

 x = x*0.2

by

 x = x/5.0

restores expected/normal results.

Strange to find the difference between 0.2 and 1/5.0 to cause a change in a result from 0.0E0 to 1.0E0. Perhaps, a striking example for use in a course on programming?

JohnCampbell

Posts: 2526 Sydney

Back to Top

8 Mar 2021 10:31 #27250

I am thinking that the actual value x = 1.0240E-07, that is generated using x = x*0.2 is causing a roundoff error in the interpolation formula for tanh (x), so that tanh(x) /= x, so y /= 0

The value of x is very close to epsilon(x), so that a roundoff error would produce y = 1

If tanh(x) is returned as x*(1-epsilon) ; the nearest representable value less than x, ie roundoff error in the calculation of tanh

y = one/x - one/tanh = 1/x - 1 / ( x*(1-epsilon) ) = ( x*(1-eps) - x ) / ( x * x*(1-eps) ) = x*eps / ( x * x ) = 1 when x = epsilon (x)

you have 'stumbled' on a unique value of x where tanh (x) fails due to roundoff, which is only present in some versions of the gFortran tanh library.

mecej4

Posts: 1911

Back to Top

8 Mar 2021 11:08 #27251

John, the following program may help us get closer to the heart of the problem. It does the same calculation as before, but with values of x close to the problematic value. With versions of Gfortran on Windows that use the implementation of tanh() in the Microsoft VC RTL, such as Mingw, the last column is always zero. With Cygwin on Windows, MS WSL-1 and on Linux, we can speculate that the underlying GNU math library causes different behavior, with the last column giving either 1.00000E+00 or 0.00000E+00.

program mterr
use iso_fortran_env
implicit none
integer i,ix
real x,tnhx,f
!
write (*,*) compiler_version()
print *
print *,'           x      x(IEEE hex)    tanh(x)    1/x - 1/tanh(x)'
print *,'     ----------   --------    -----------   ---------------'     
ix = Z'33DBE6F8'
do i=1,16
   x = transfer(ix,x)
   tnhx = tanh(x)
   f = 1.0/x - 1.0/tnhx
   print 10,i,x,ix,tnhx,f
   ix = ix+1
end do
10 format(i2,ES15.5,2x,Z8,2ES15.5)
end program

I'd appreciate your trying this program with your Gfortran 10.x. On a Linux server in the Cloud, here is what I got:

$gfortran -std=gnu *.f95 -o main
$main
 GCC version 7.1.1 20170622 (Red Hat 7.1.1-3)

            x      x(IEEE hex)    tanh(x)    1/x - 1/tanh(x)
      ----------   --------    -----------   ---------------
 1    1.02400E-07  33DBE6F8    1.02400E-07    1.00000E+00
 2    1.02400E-07  33DBE6F9    1.02400E-07    1.00000E+00
 3    1.02400E-07  33DBE6FA    1.02400E-07    0.00000E+00
 4    1.02400E-07  33DBE6FB    1.02400E-07    1.00000E+00
 5    1.02400E-07  33DBE6FC    1.02400E-07    1.00000E+00
 6    1.02400E-07  33DBE6FD    1.02400E-07    0.00000E+00
 7    1.02400E-07  33DBE6FE    1.02400E-07    1.00000E+00
 8    1.02400E-07  33DBE6FF    1.02400E-07    1.00000E+00
 9    1.02400E-07  33DBE700    1.02400E-07    0.00000E+00
10    1.02400E-07  33DBE701    1.02400E-07    1.00000E+00
11    1.02400E-07  33DBE702    1.02400E-07    1.00000E+00
12    1.02400E-07  33DBE703    1.02400E-07    1.00000E+00
13    1.02400E-07  33DBE704    1.02400E-07    0.00000E+00
14    1.02400E-07  33DBE705    1.02400E-07    1.00000E+00
15    1.02400E-07  33DBE706    1.02400E-07    1.00000E+00
16    1.02400E-07  33DBE707    1.02400E-07    0.00000E+00

For readers who have not been following this thread closely: FTN95 is not afflicted with this problem.

JohnCampbell

Posts: 2526 Sydney

Back to Top

8 Mar 2021 12:04 #27253

I used 10 format(i2,ES16.7,3x,Z8,ES16.7,ES15.5)

and got on i5-2300 Z:\Temp\mecej4>set prog=mytanh4

Z:\Temp\mecej4>del mytanh4.exe 

Z:\Temp\mecej4>gfortran mytanh4.f90 -o mytanh4.exe -ffast-math -fopenmp -v 

Z:\Temp\mecej4>mytanh4
 GCC version 9.2.0

             x       x(IEEE hex)     tanh(x)    1/x - 1/tanh(x)
     -------------   --------   -------------   ---------------
 1   1.0239995E-07   33DBE6F8   1.0239995E-07    0.00000E+00
 2   1.0239996E-07   33DBE6F9   1.0239996E-07    0.00000E+00
 3   1.0239997E-07   33DBE6FA   1.0239997E-07    0.00000E+00
 4   1.0239997E-07   33DBE6FB   1.0239997E-07    0.00000E+00
 5   1.0239998E-07   33DBE6FC   1.0239998E-07    0.00000E+00
 6   1.0239999E-07   33DBE6FD   1.0239999E-07    0.00000E+00
 7   1.0239999E-07   33DBE6FE   1.0239999E-07    0.00000E+00
 8   1.0240000E-07   33DBE6FF   1.0240000E-07    0.00000E+00
 9   1.0240001E-07   33DBE700   1.0240001E-07    0.00000E+00
10   1.0240002E-07   33DBE701   1.0240002E-07    0.00000E+00
11   1.0240002E-07   33DBE702   1.0240002E-07    0.00000E+00
12   1.0240003E-07   33DBE703   1.0240003E-07    0.00000E+00
13   1.0240004E-07   33DBE704   1.0240004E-07    0.00000E+00
14   1.0240004E-07   33DBE705   1.0240004E-07    0.00000E+00
15   1.0240005E-07   33DBE706   1.0240005E-07    0.00000E+00
16   1.0240006E-07   33DBE707   1.0240006E-07    0.00000E+00

mecej4

Posts: 1911

Back to Top

8 Mar 2021 12:15 #27254

OK, thanks! It must be the math library used by Gfortran/GCC, then.

JohnCampbell

Posts: 2526 Sydney

Back to Top

8 Mar 2021 11:22 #27256

The hex value for TANH would clarify any problem. program mterr use iso_fortran_env implicit none integer i,ix,it real x,tnhx,f ! write (,) compiler_version() print * print *,' x x(IEEE hex) tanh(hex) tanh(x) 1/x - 1/tanh(x)' print *,' ------------- -------- -------- ------------- ---------------'
ix = Z'33DBE6F8' do i=1,16 x = transfer(ix,x) tnhx = tanh(x) it = transfer(tnhx,it) f = 1.0/x - 1.0/tnhx print 10,i,x,ix,it,tnhx,f ix = ix+1 end do 10 format(i2,ES16.7,1x,2Z10,ES16.7,ES15.5) end program

mecej4

Posts: 1911

Back to Top

9 Mar 2021 11:07 #27257

John, I raised this issue on CLF.

A few of the responders who had older versions of Gfortran on Linux did report seeing the 1.0s in the output.

Most of them had a recent version of Gfortran on Linux, MacOS, or even Android (on a Nokia phone!), and did not see the error.