Topic: New Kinds in Suggestions

JohnCampbell

Posts: 2526 Sydney

Back to Top

26 Oct 2009 5:35 #5249

Paul,

Following on from the discussion of KIND, is it an option to provide REAL6 or INTEGER6. There was a time when all reals were calculated in the co-processor, and I thought that real4 ( and real8 ) was just a truncated 80-bit real10. Is this the case ? If so would REAL6 be a simple extension of managing REAL4. There is certainly a big gap between R4 and R8 in precision and R6 would provide about 11 significant digits (precision).

I'm not sure of the basis of INTEGER8 from INTEGER4, but INTEGER*6 could be a useful alternative ?

Just a thought !

John

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

26 Oct 2009 2:11 #5251

John,

I'm a real believer (no pun intended) in REAL6 and INTEGER6. The problem is that they aren't native to (x87) coprocessors, and all the operations would need to be coded from scratch (i.e. done in software).

When I used MS Fortran, they had 2 libraries one could link with - one where the math was done largely in software, and one where it was done largely in hardware. They didn't always give the same result! In part, this was because REAL*8 match was done in 64 bits, whereas the coprocessor operations loaded things into 80-bit registers, so that the round-off was potentially different.

Eddie

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

26 Oct 2009 7:37 #5252

selected_integer_kind and selected_real_kind allow you to select the precision etc (within certain hardware limits) but these are mapped to those provided by the processor and co-processor. In other words if you asked for the equivalent of *6 then you would get *8 anyway. Providing *6 via software would be slower than the *8 provided by the hardware.

JohnCampbell

Posts: 2526 Sydney

Back to Top

27 Oct 2009 2:54 #5255

Paul,

I was under the impression that real4 and real8 were done in the 80-bit math co-processor. Results were stored in the word address, with truncation of the accuracy. So my assumption for real6 would be that the calcs would be in the coprocessor, but the truncation would be different. This is not consistent with the statement 'providing real6 via software' I have also seen past reference to a 64-bit rather than 80-bit arithmetic (SSE?) instructions, which would change this assumption. Is the co-processor no longer used and are real4 and real8 calculations now done differently ?

John

Sebastian

Posts: 177

Back to Top

27 Oct 2009 7:04 #5256

So my assumption for real*6 would be that the calcs would be in the coprocessor, but the truncation would be different.

The fpu has no support for that. It handles 32bit (single precision), 64bit (double precision) and 80bit (extended precision) operations. If you need more information just post or read through some hardware docs like http://sandpile.org/ia32/opc_fpu.htm or the intel (amd) instruction set references.

JohnCampbell

Posts: 2526 Sydney

Back to Top

27 Oct 2009 7:26 #5258

Is 80-bit extended precision the same as real10 or is real10 software implemented ?

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

27 Oct 2009 8:10 #5259

Yes extended precision is the same as real*10.

Sebastian

Posts: 177

Back to Top

27 Oct 2009 8:24 #5262

The x usually specifies the amount of bytes required for the data type (this may be awfully wrong for non-x86/non-PC fortran implementations) so real10 is the 10byte=80bit floating point type as Paul said.

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2009 12:24 #5270

Paul,

I am trying to understand how real6 could be done and real10 is done. My question re real*10 is : Is it hardware implemented, with all calculations done in the 80-bit math co-processor, or is that an obsolete technology?

To test out this I wrote a program that repeated vector dot product on 1000 element arrays as real8 or real10, using dot_product intrinsic or simple function which has a loop:-

      REAL*10 FUNCTION VECSUM_10 (A, B, N)
!
!     Performs a vector dot product  VECSUM =  [A] . [B]
!     account is taken of the leading zero terms in the vectors
!
      integer*4,                 intent (in)    :: n
      real*10,   dimension(n),   intent (in)    :: a
      real*10,   dimension(n),   intent (in)    :: b
!
      real*10   c
      integer*4 i
!
      c = 0
      do i = 1,n
         if (a(i) /= 0) exit
      end do
      do i = i,n
         c = c + a(i)*b(i)
      end do
!
      vecsum_10 = c
      return
!
      end

Compiling without /opt The results are :-

 Test Type      Routine      Seconds      Ratio
real*8 test     vecsum_8       4.28        1.00
real*8 test     dot_product    4.276       1.00
real*10 test    vecsum_10      5.515       1.29
real*10 test    dot_product    7.432       1.74
real*4 test     vecsum_4       2.923       0.68

Real10 takes 30% longer that real8, but 74% longer using the dot_product intrinsic. Real4 takes only 68% of real8 computation time.

This indicates to me that real10 is not simply taking the 80-bit result from the math co-processor while real8 and real*4 truncate the output. Either this or the instructions to move 4, 8 or 10 bytes take a lot of time.

Any advice ?

John

Sebastian

Posts: 177

Back to Top

28 Oct 2009 7:14 #5271

This indicates to me that real10 is not simply taking the 80-bit result from the math co-processor while real8 and real*4 truncate the output.

How do you come to that conclusion? There are a lot of implementation details in the fpu that make 80bit usage the non-standard like there are no operations like 'add an 80bit value from memory to an fpu register' like there is for 32bit and 64bit. 80bit values always have to be loaded into a temp fpu register first. Also keep in mind that of course reading 10 bytes from memory obviously takes longer than only reading 4 or 8 bytes, especially since 10 bytes usually are laid out to occupy 16 bytes due to better access speeds.

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2009 7:30 #5272

Sebastian wrote 'How do you come to that conclusion? ' I also said that 'Either this or the instructions to move 4, 8 or 10 bytes take a lot of time.' I just find that the ratios of 130% and 68% are big spreads for just moving bytes, as compared to floating point calculation times. Is an 80-bit fpu always used for real calcualtions ?

Sebastion also wrote :

Also keep in mind that of course reading 10 bytes from memory obviously takes longer than only reading 4 or 8 bytes, especially since 10 bytes usually are laid out to occupy 16 bytes due to better access speeds.

Again I'm surprised how much longer it takes for reading values and when is this 16 byte claim true ?

John

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

28 Oct 2009 8:20 #5273

The answer to these questions can be researched by using /explist on the command line. This will show the assembly instructions generated by FTN95. You will then need to look up these instructions in an Intel manual.

There will be little or no software intervention except perhaps in the case of INTEGER*8. The native 32, 64 and 80 bit instructions will not be truncated unless your source code stipulates this. You will also be able to look up the timing of the native instructions.

Basically FTN95 will aim to give you the maximum precision that is available in any given situation, even to the point of sometimes using 80 bits internally when a 64 bit result is being generated.

With the speed of modern processors, the speed of a native 32 bit multiply (say) as against a 64 bit native multiply is rarely an issue.

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

28 Oct 2009 10:37 #5274

Speed may not be an issue, but storage is, and if (say) REAL*6 was good enough for (again, say) FE calculations, then one would have 25% longer arrays to do the matrix operations in - while sticking with a 32-bit OS and the limitations of that. That puts off the evil moment when the solution has to use the hard disk .... which slows the process down hugely.

It's a very ong time since I knew my way round the 8087 fpu book (8087 applications and programming) and my understanding is that first MMX and later SSE provided alternate ways to do certain math operations. I got lost at that point. None of the standard methods countenance REAL*6.

Eddie

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2009 1:09 #5275

Thanks Eddie for providing the names of the more recent MMX and later SSE instructions. I apologise, but I am not sufficiently familiar with assembler to understand what is happening in /explist. Can't I get a clear answer to my question of is the realx maths done in the co-processor or is it the more recent instructions ? I am surprised by the difference in gross computation time between real4, *8 and *10. Is the only explaination the different in moving the necessary bytes. Any clear advice would be appreciated. John

Sebastian

Posts: 177

Back to Top

28 Oct 2009 3:26 #5276

As far as I know MMX/SSE/SSE2 do not support 80bit registers.

I am surprised by the difference in gross computation time between real*4, *8 and *10. Is the only explaination the different in moving the necessary bytes.

As I've already noted above there are fundamental differences in how 80bit data can be used in the fpu compared to 32bit and 64bit. And the differences between 32bit and 64bit access are data loading and the time required for the respective instruction which depends on the CPU's implementation. So you'd have to ask Intel/AMD why 64bit operations are slower than 32bit.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

28 Oct 2009 8:15 #5277

I do not have the answer to your question but I can ask and get back to you later. It would help if you could explain why the question is important to you.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

29 Oct 2009 7:32 #5278

Here is some information from our expert:

All the data types, Real4, Real8, Real10, Integer1, Integer2, Integer4, and Integer10 are implemented in hardware - although some operations may involve more than one instruction. The coprocessor actually contains instructions to process Integer8 data. It can't process Integer*10 data, because the 10 bytes is partly consumed by the exponent.

Silverfrost Fortran only supports native data types, so Integer6 is not supported. However, if you need to pack data using this precision, why not equivalence an integer8 to a Character*8 data item (a Silverfrost Fortran extension) and just keep the first 6 bytes (i.e. characters) of the result (the PC architecture takes the first byte as the one with least significance).

IanLambley

Posts: 501 Sunderland

Back to Top

30 Oct 2009 12:23 #5290

Thats OK for storing, but on retrieval, the two most significance bytes of the integer*8 would need to be set to 0000h or FFFFh for positive and negative numbers respectively.

Secondly for Eddie refered to Real6 for FEA work, which is where the real saving would be and the truncation and re-instatement of Real6 from Real*8 would be difficult with respect to the size of the exponent and its sign bit, and less so for the mantissa and its sign bit, the mantissa, usually being left justified with an appropriate increment or decrement of the exponent.

What is the format of a Real*8?

Ian

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

30 Oct 2009 4:40 #5292

I would have to look this up.

Try http://en.wikipedia.org/wiki/IEEE_754-2008.

Or equivalence to an INTEGER*8 and display the result in hexadecimal. This will show you the sign bit, the exponent and the mantisa. The help file gives you some idea of the range under the heading of Real KINDs.