forums.silverfrost.com
Welcome to the Silverfrost forums

Author Message
silverfrost
Site Admin

Joined: 29 Nov 2006
Posts: 169
Location: Manchester

Posted: Tue Sep 07, 2004 9:42 pm    Post subject: Unexpected floating point results

Summary
It is possible that a program running on your system will not produce the expected results from floating point calculations due to the differences between 64 bit and 80 bit precision.

Details

The following program is an example of this problem and produces unexpected results:
 Code: INTEGER*4 MAXROW, I PARAMETER (MAXROW = 100) COMMON /BLOCK/ VOLUME COMMON /BOUND/ L(MAXROW), U(MAXROW) DOUBLE PRECISION VOLUME, L, U VOLUME = 100D0 DO 100 I = 1, MAXROW L(I) = .2D0 U(I) = .2D0 100 CONTINUE CALL SUB(L, U, MAXROW) END SUBROUTINE SUB(L, U, ROWS) INTEGER*4 ROWS, I DOUBLE PRECISION L, U, VOLUME COMMON /BLOCK/ VOLUME DIMENSION L(*), U(*) DO 100 I = 1, ROWS L(I) = L(I) / VOLUME U(I) = U(I) / VOLUME - L(I) WRITE(*, *) I, L(I), U(I) READ(*, *) 100 CONTINUE RETURN END

The problem occurs with the line U(I) = U(I) / VOLUME - L(I).

At this point each element in the array L should be 0.002D0, and the expected result of this assignment is that each element in the array U should be 0. Running the program results in the following output:
 Code: 1 2.0E-3 6.95E-20 2 2.0E-3 6.95E-20

and so on

The values output from the program have been rounded and trailing zeroes removed. The third column is for each element of the array U and should always be 0. If you examine the assembly code generated by the compiler you will find the following code has been generated:

 Code: L(I) = L(I) / VOLUME DFLD L(I) DFDIV VOLUME DFSTP L(I) U(I) = U(I) / VOLUME - L(I) DFLD U(I) DFDIV VOLUME DFSUB L(I) DFSTP U(I)

All the arithmetic within each statement is performed to 80 bit precision. However, when the data is saved to memory it is truncated to 64 bits for the double precision variable. If this were decimal and L(I) and U(I) were both 2/3 and volume was 2, you would effectively be performing 0.3333333333 - 0.333333 yielding a result of 0.3333e-6. In this case the value is 0.002 which is not exactly representable in binary. The result is truncated to the 64 bits when saved to memory. When the DFSUB is performed, the 80 bit value of 0.002 has a 64 bit representation of the same number subtracted from it. This leaves the residue in the coprocessor which is still a valid 64 bit number.

In order to obtain the expected answers the above code would need to be re-coded as follows:

 Code: L(I) = L(I) / VOLUME U(I) = U(I) / VOLUME U(I) = U(I) - L(I)

For FTN77, the program would also need to be coded with the floating point tracking option turned off (i.e. /NO_FLOATING_TRACKING)

Note that, in order to allow you to obtain the best possible floating point accuracy within expressions, Salford compilers are coded so that the coprocessor works in 80 bit mode.
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMT + 1 Hour Page 1 of 1

 Jump to: Select a forum Admin----------------Announcements FTN95----------------GeneralKBaseSupportSuggestionsClearWin+Plato64-bit FTN77----------------Support
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © 2001, 2005 phpBB Group