Topic: Does this code work under debugger and without ? in 64-bit

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

2 Mar 2017 10:44 #18948

i=1
k=2

do while (k>1)	
i=i+1.1
if(i/1000000*1000000.eq.i) print*,i
enddo

end

compile and run it:

FTN95 a.f95 /64 /debug sdbg64 a.exe

This was demonstration code reduced to minimum. It should give the error message of integer overflow but it does not. It goes into infinite loop around 33M.

In larger code I reduced it from it gives wrong error (invalid floating point operation), and the debugger stops on wrong line (next line after offending one)

mecej4

Posts: 1911

Back to Top

3 Mar 2017 12:55 #18951

Perhaps you did not intend to do so, Dan, but you have exposed a property of the code generated by FTN95-64 for processing mixed integer and real expressions in the XMM registers using floating point instructions. The following adaptation of your program shows the problem in a striking way.

program danx
implicit none
integer i

i=33554430
i=i+1.1
print*,i

end

The printed output is 33554432, instead of the expected 33554431, and the reason is that the expression i+1.1 is calculated using single-precision floating point arithmetic. The value of the expression is such that 24 bits are no longer sufficient to provide the correct conversion to integer. Note that the correct result is 2^{25 - 1. For related reasons, the following code will not increment i beyond 2}25, so if you have a DO with a condition on i that depends on such values, the condition may never be satisfied and the program will have an infinite loop.

program dany
implicit none
integer i,j

i=33554430
do j=1,5
   i=i+1.1
   print*,i
end do
end

The Fortran standard puts the responsibility on the programmer to avoid overflow, and you forced floating point evaluation by adding 1.1 instead of 1. If you write 1.1d0, instead, you will find that the correct result is shown, since the expression is then evaluated using double precision reals.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

3 Mar 2017 2:30 #18952

WOW! One more dead moster besides 16 and 32bits -- the 24bit ! -- got out of its grave in the 64bit code...And we have to keep this devilry in mind? That is source of big numerous troubles in the future because this is rear thing and red flags will always be forgotten. Due to this feature such hidden errors in 64 bit codes will never be found. Thanks Mecej4, I am shocked

After getting out of shock here what I was initially tried to demonstrate

i=1
k=2
do while (k>1)	
i=i+1.1d0
if(i/10000000*10000000.eq.i) print*,i

enddo

end

Besides that red double-line in SDBG64 is also good to fix

mecej4

Posts: 1911

Back to Top

3 Mar 2017 4:36 #18953

Besides that red double-line in SDBG64 is also good to fix.

I don't know what that is. I am waiting for the personal edition of 8.1 to be made available. The 8.05 version of SDBG64 is hardly of any use to me. I cannot view assembly and even attempting to make the font larger causes SDBG64 to self-destruct.

JohnCampbell

Posts: 2526 Sydney

Back to Top

3 Mar 2017 7:17 #18954

It is surprising that ' i = i + 1.1 ' would round down below 'I = I + 1' Definitely something to remember, although I rarely use real*4 constants with 24 bit accuracy.

I did some other changes, to make the print test 'better', which again resulted in a different round-off problem. Something else to avoid.

64-bit with expectations on larger problems and values is going to throw up more of these.

 integer i,k, next
 i=1 
 k=2 
 next = 0

 do while (k>1)    
   i=i+1.1d00
   if (i >= next) then
     print*,i
     next = i+1000000
   end if
 end do 

 end

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

3 Mar 2017 7:47 #18956

At first sight I think that we should be able to fix this. I have made a note that it needs investigating.

mecej4

Posts: 1911

Back to Top

3 Mar 2017 1:22 #18960

Paul, my initial reaction on running Dan's example was that a compiler bug was involved. However, further consideration leads me to think that this is a programmer error in the sense that a calculation is performed that causes overflow on some processor (or processor FPU).

Here are some results for my first test program of this thread from the competition, on Windows XP-SP3, 32-bit, Athlon X2 4200+ ('S' = sequential numbers, last digit 1, 2, 3, 4, 5; 'F' = fixed last digit 2).

gfortran 4.5, 32 bit, -march=i386            S
                             -march=i686            S
                             -msse2                    F
ifort 2013SP1U6     -Qxhost                    F
                             -QxSSE2                 F
                             -QxSSE3                 S

As you can see, the results are 'processor-dependent'. Perhaps the best solution is to write code such as

     IVAR = <int expr> + INT(<real expr>)

instead of

     IVAR = <int expr> + <real expr>

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

3 Mar 2017 2:26 #18962

mecej4

Thanks for the feedback. I understand that the code is not good and that the result may be processor dependent but I am fairly sure that there is also a bug in FTN95 in this context.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

3 Mar 2017 3:04 #18963

I understand it now.

i=33554430 
i=i+1.1

In the second line, i is converted to real, then 1.1 is added, then the result is truncated to an integer.

But the key is that the real value suffers from round-off error.

So, yes it is a programming error.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

3 Mar 2017 6:12 #18964

But the problem is that there is no such error in 32 bit mode! Conversion of older 32 bit code to 64 bits here must be straightforward without adding any side effects because we formally still stay with 32 bit arithmetics. Weird 24 bit mode must be excluded as default, period, because this is way too shadow feature, no one will remember it to avoid. Just try to realize this: you took legacy 32 bit code which worked OK and switching from 32 bits to 64 and got 24 bit downgrade with what programmers use most - in mixing real and integer numbers - wow, how absurd the situation is. Worst idiotism ever. Is this what Standard prescribed??? No words. The compiler must report this as a warning then if keeping this craziness.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

3 Mar 2017 6:53 #18965

Dan

I think that you misunderstand the issue. If it works for 32 bits then it's just luck. The round-off error must turn out to be different. A REAL value has only a limited number of significant figures. As far as the compiler is concerned, the code is treated in the same way and any difference is in the associated assembler instructions and the way in which these are implemented by the central processor in use.

mecej4

Posts: 1911

Back to Top

3 Mar 2017 6:59 (Edited: 5 Mar 2017 7:16) #18966

Quoted from DanRRight But the problem is that there is no such error in 32 bit mode! That is true for FTN95, but not for Gfortran, Intel or Lahey, all in 32-bit mode (see my previous post for results from those compilers).

FTN95 uses only X87 instructions for FP in 32-bit mode. The other compilers let you choose between X87 and SSE/SSE2/SSE3. The X87 FPU has only 80 bit registers (64 bit mantissa, 15 bit biased exponent, 1 bit sign), so the overflow problem would occur only with much larger numbers than in your test program.

It only adds to the confusion when the terms '32-bit' and '64-bit' are used in vain. Those are address sizes, and have very little to do with FPU registers, X87 or SSE/XMM.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

3 Mar 2017 8:24 (Edited: 3 Mar 2017 10:43) #18967

Quoted from PaulLaidler I think that you misunderstand the issue How i could misunderstand if i fixed it in my example 2 few messages up? 😃

Quoted from PaulLaidler As far as the compiler is concerned, the code is treated in the same way and any difference is in the associated assembler instructions and the way in which these are implemented by the central processor in use.

This could not be the same obviously. Just comparing the speed of integer and FP operations in old 32 and new 64 compiler tells that the new one is way faster (up to 5x, compile with /opt)

CALL CPU_TIME(tStart)
k=1
1 j=1
do i=1,10000000
j=j+1
enddo
k=k+1
if(k.lt.100) goto 1

CALL CPU_TIME(tFinish)
RunTime=tFinish-tStart
OpPerSInt = 1e9/Runtime
Print*, RunTime, OpPerSInt

k=1
CALL CPU_TIME(tStart)
2 a=1.
do i=1,10000000
a=a+1.
enddo
k=k+1
if(k.lt.100) goto 2

CALL CPU_TIME(tFinish)
RunTime=tFinish-tStart
OpPerSfp = 1e9/Runtime
Print*, RunTime, OpPerSfp

end

32bit  Time          Op/second
INT  1.64063        6.095238E+08
FP   2.20313        4.539007E+08

64bit
INT  0.281250       3.555556E+09       6x   speedup
FP   0.890625       1.122807E+09       2.5x speedup

Quoted from mecej4

It only adds to the confusion when the terms '32-bit' and '64-bit' are used in vain. Those are address sizes, and have very little to do with FPU registers, X87 or SSE/XMM.

One more confusion is added here: the user all his life expected 32bit integer+32bit FP to run to 2B before crash not to 30M. These integers were used as indices of arrays so the change in accuracy directly influencing address space. These compiler manufacturers together with processor designers choose speed. They switched to faster but smaller mantissa FP units and SSE to do integer operations. Did they warn in compilation LOG file that INT4 + FP4 could now be misleading ?

OK, this way is faster, but the compiler then must warn about use of real4 and integer4 together that the song may end way faster then they expect and suggest to switch at least to real*8 because this with new FP processors has no performance penalty (not sure about SSE) while 64bit vs 32bit resolves memory space penalty. Or must implement runtime crash of integer at 33M. Hell, otherwise you will never find the hidden bugs in large codes, this one specifically.

JohnCampbell

Posts: 2526 Sydney

Back to Top

3 Mar 2017 10:13 #18968

Dan,

There is a bug in the code : It is a real*4 round-off error, but doesn't appear with some configurations.

I have experienced lots of examples of coding bugs that don't appear with some compilers, but do with others. This often happens when moving a code to different compilers or hardware.

It's good you now know that real4 is only accurate to 24 bits and not 32 bits. That's 7 figures of accuracy, compared to '9' for Integer4.

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

3 Mar 2017 10:19 #18969

John, See? Even the most experienced people here like you did not know that damn feature from the hell. Would you like to re-visit Polyhedron with /64 /opt and may be catch few possible bugs? The new debugger is much better but sometimes miss offending place by 1-2 lines causing confusion.

JohnCampbell

Posts: 2526 Sydney

Back to Top

3 Mar 2017 11:02 #18970

Dan,

Thanks for the recommendation. I shall look at the 64-bit debugger.

I was reviewing your quoted run times. I don't get the performance you quoted; must be related to the processor. I assume you used /64 /opt. It looks like /64 /opt is performing well on these loops.

Paul, Any documentation of what /64 /opt can achieve? Does it include SSE or AVX instructions in inner loops, which could support more complex calculations than supported by (the very useful) DOT_PRODUCT8@ or AXPY8@ ?

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

3 Mar 2017 11:35 #18971

Do you have 8.10? Use /opt then

mecej4

Posts: 1911

Back to Top

4 Mar 2017 1:04 #18972

Dan, here is a constructive comment. If you have code that you suspect will give you trouble when you switch to the FTN95-64 compiler, you can do a 32-bit run in which you partly simulate the 64-bit behavior of mixed-mode arithmetic expressions.

This is easy to do with, for example, Intel Fortran, which gives you the /Qpc32 option. This option forces the x87 FPU into dropping the bit size of the mantissae down from 64 to 24. Thus, if you make two 32-bit runs, one compiled with /Qpc32 and another without, if the results are the same, you have a high likelihood that moving to 64-bits will not cause loss of precision in mixed mode expressions. If the results differ, you can pinpoint and fix the problem in 32-bit mode, where the debugger may be more convenient to use.

Doing the equivalent of Intel's /Qpc32 with FTN95-32 is possible, but not so convenient. You have to manipulate the X87 control word, and get FTN95 to ignore precision loss at run time (i.e., mask the precision exception).

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

4 Mar 2017 7:15 #18973

Thanks Mecej4, I am sure that Silverfrost will offer similar option easily making a switch which will issue warning or even report error in such cases. Besides parallelization Intel has no appeal to me. I need compiler which has unbeatable bug sniffing capabilities because actually code development, debugging and validation is 99% of time, health and brain cell losses.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

4 Mar 2017 8:57 #18974

Here is a program that illustrates the point made by Mecej4 that '32 bit' FTN95 gives greater precision than one might expect because it uses x87 floating point coprocessor instructions to add a REAL4 to an INTEGER4.

program test
i=16777200 ! A little under 16777216 which is 2 raised to the power 24
do
  k=i+1.0 
  i=i+1
  if(k /= i)then
    print*,'Last good value:',i-1
    exit
  endif
  if(i > 2147483646)then
    print*, 'Good for all integer*4' 
    exit
  endif  
enddo  
end

Dan: At the moment I don't know if it would be feasible to get '64 bit' FTN95 to do 'better' than it does at the moment. Maybe a warning could be provided.

John: I don't have any details at the moment about what /opt does, other than that in part it aims to minimise the number of assembly instructions. The use of SSE2 and AVX for the dot product etc. has not changed since 8.05 except that (in some cases) you will now get SSE2 for a DOT_PRODUCT anyway (with or without /opt). I will ask for further details.