|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Wed Feb 18, 2015 1:40 pm Post subject: Slow handling of FPU underflow interrupts in SALFLIBC.DLL |
|
|
Here is a small code that displays the huge slowdowns caused by underflow interrupts occurring in DAXPY type code in Fortran. These slowdowns occur when underflow exceptions are handled in the default mode of code compiled with FTN95. Most of the issues were written up in another long thread, see http://forums.silverfrost.com/viewtopic.php?t=2465&postdays=0&postorder=asc&start=0, but here is a compact test code.
Code: |
program xundfl
implicit none
integer, parameter :: NMAX=2000, NREP=50000
double precision, dimension(NMAX) :: X,Y,Z
double precision A,t1,t2
integer :: i,k,n,cnt1,cnt2
open(15,file='vecsub.bin',form='unformatted',status='old')
read(15)n,a,(x(i),y(i),i=1,n)
close(15)
call dclock@(t1)
call underflow_count@(cnt1)
do k=1,NREP
do i = 1,n
z(i) = y(i) - a*x(i)
end do
end do
call underflow_count@(cnt2)
call dclock@(t2)
write(*,10)n,t2-t1,(cnt2-cnt1)/NREP
10 format('Vectors of size ',I4,2x,' time = ',f6.3,' ufls = ',i4)
end program
|
The data file for the test code is an unformatted Fortran data file that contains two double precision vectors x and y of length n =1679, and the test program performs the operation z = y - a x fifty thousand times. You can download the file (27 KB) from the public link https://dl.dropboxusercontent.com/u/88464747/vecsub.bin .
Compile with /opt /p6 and link. Run the program and record the output. Set the environment variable SALFENVAR=MASK_UNDERFLOW, and run again. You will see something similar to the following:
Code: |
s:\lang\JCampBell\SAL>set SALFENVAR=
s:\lang\JCampBell\SAL>xundfl
Vectors of size 1679 time = 33.860 ufls = 88
s:\lang\JCampBell\SAL>set SALFENVAR=MASK_UNDERFLOW
s:\lang\JCampBell\SAL>xundfl
Vectors of size 1679 time = 0.688 ufls = 0
|
From these results we can estimate the time spent per one execution of the FPU underflow interrupt handler to be the equivalent of about 12,000 CPU cycles, which agrees with a similar estimate made in John Campbell's thread linked above, where the program can take hours to run with the default underflow processing versus about 20 seconds with SSE2 code.
Last edited by mecej4 on Wed Feb 18, 2015 3:33 pm; edited 1 time in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Wed Feb 18, 2015 3:28 pm Post subject: |
|
|
Here is an even simpler example, for those who may be a bit worried about downloading a binary file from an "unknown" person. This example does not need a data file, and does not do a DAXMY operation.
Code: |
module globvars
logical :: uflseen = .false.
integer :: count = 0
end module
program testunderflow
use globvars
implicit none
integer i,k
!
real :: x,y,dx,s
double precision t1,t2,c1,c2
!
call dclock@(t1)
c1=cpu_clock@(c1)
do k=1,200000
x=1e-30
y=x+1e-36
s=0
do i=1,20
dx=y-x
s=s+dx
! write(*,*)i,dx
y=y*0.5d0
x=x*0.5d0
end do
end do
call dclock@(t2)
call underflow_count@(count)
write(*,10)'count = ',count,' time = ',t2-t1
c2=cpu_clock@()
write(*,20)'CPU cycles used = ',c2-c1,', cycles/underflow = ',(c2-c1)/count
10 format(1x,A,I20,3x,A,2x,F8.3,' s')
20 format(1x,A,EN10.2,A,F8.0)
end program
|
Make two runs, one with SALFENVAR set to MASK_UNDERFLOW, and one without. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Wed Feb 18, 2015 5:19 pm Post subject: |
|
|
Interesting. Why
call mask_underflow@()
does not do the same thing? |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Wed Feb 18, 2015 6:53 pm Post subject: |
|
|
Exception handling is an arcane and complicated art. Interrupts/traps do not follow the orderly flow of logic in a program. If an exception is unmasked by calling a service routine within the program, the exception flags have to be restored before calling a library routine and when that routine returns. There may be OS conventions about saving exception control registers, setting your desired values and restoring the old values at subprogram return or other places. Conventions are also needed as to who (the user program or library routine that caused the trap, the OS, the default signal handler or the user-supplied replacement handler) is responsible for doing the saves and restores.
Finally, when your exception handler decides to increment a count of , say, underflow, it has to supply a reasonable value (such as zero) for the result of the expression whose evaluation caused the exception, return to the point in the user program or library routine where the exception was taken, and restore the program state; moreover, this cannot be done directly but through the OS and the compiler's runtime, and requires facilities and cooperation from the OS.
A scholarly article about this topic: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_handle.html . |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Wed Feb 18, 2015 11:49 pm Post subject: |
|
|
One more interesting thing: when i changed env.variables as above the small code above was compiled OK but compilation of my own ones started producing this error
*** Internal compiler error - floating point exception
Only removing env.variable and rebooting computer helped... |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Feb 19, 2015 12:11 am Post subject: |
|
|
Interesting.
The compiler itself depends on Salflibc.DLL. If you have a constant expression of type REAL in your program, the compiler may invoke FPU routines in the DLL to evaluate them once for all, and put the resulting value into the OBJ and EXE. If the code in the DLL checks the environment variable, and the evaluation of the expression causes an FPU exception, and the compiler expects exceptions to be masked and handled in the default way, it may get stuck.
I am speculating, of course, since I have no real information about the innards of the DLL. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Thu Feb 19, 2015 1:47 am Post subject: |
|
|
Mecej4,
Thanks very much for the simplified example of the problem.
I enhanced the first example above to:
include the use of MASK_UNDERFLOW@, and
"estimate" the delay due to floating point exception handling
the revised program is: Code: | program xundfl
implicit none
integer, parameter :: NMAX=2000, NREP=50000
double precision, dimension(NMAX) :: X,Y,Z
double precision A,t1,t2
integer :: i,k,n,cnt1,cnt2, pass, nfpe
real*4 :: op_count, mflops
real*10 :: c1,c2
open (15,file='vecsub.bin',form='unformatted',status='old')
read (15) n,a,(x(i),y(i),i=1,n)
close(15)
write (*,*) 'n =',n
do pass = 1,2
write (*,*) ' '
if (pass==2) then
write (*,*) 'Mask Underflow enabled'
call MASK_UNDERFLOW@
end if
call dclock@ (t1) ! elapsed time (seconds)
c1 = cpu_clock@ () ! elapsed time (CPU cycles)
call underflow_count@(cnt1)
do k=1,NREP
do i = 1,n
z(i) = y(i) - a*x(i)
end do
end do
call underflow_count@(cnt2)
call dclock@ (t2)
c2 = cpu_clock@ ()
!
c2 = c2 - c1 ! clock cycles
t2 = t2 - t1 ! elapsed seconds
nfpe = cnt2 - cnt1 ! number of FPE
op_count = float(nrep) * float(n) * 2 ! floating operation count
mflops = op_count / t2 / 1.e6 ! million floating point operations per second
!
write (*,10) n, t2, nfpe/NREP
write (*,*) ' mflops =',mflops
if ( nfpe > 0) &
write (*,*) ' Floating point exception takes', c2/float(nfpe),' cycles'
end do ! pass
10 format('Vectors of size ',I4,2x,' time = ',f6.3,' ufls = ',i4)
end program |
The batch file I am using is: Code: | ftn95 xundfl /link
set SALFENVAR=
xundfl
set SALFENVAR=MASK_UNDERFLOW
xundfl
ftn95 xundfl /p6 /opt /link
set SALFENVAR=
xundfl
set SALFENVAR=MASK_UNDERFLOW
xundfl
|
Unfortunately I an not generating the other reported errors
*** Internal compiler error - floating point exception
floating point stack fault in IO_convert_long_double_to_ascii
I am estimating the FPE takes about 22,000 CPU cycles to process !!
The next stage will be to get statistics on how frequently this is happening in real programs.
John
ps: I would not recommend the use of dclock@ (based on GetTickCount) as it is accurate to only 64 ticks per second.
The intrinsic System_Clock (based on QueryPerformanceCounter) is a more accurate timer with FTN95.
real*10 cpu_clock@ is based on RDTSC and typically provides the time in processor cycles. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Thu Feb 19, 2015 8:15 am Post subject: |
|
|
John
Are you getting the compiler crash too or not? I misunderstood.
I was changing env variable in ControlPanel/System/Settings and seems changing it back by removing the line did not work until computer reboot.
By the way if you remember it was these damn underflows which caused the crash of the running code if you do multitasking jalih' s or Paul's way. Masking them by calling Silverfrost's @ utility helped to avoid crashes, that was good suggestion by Paul.
Last edited by DanRRight on Thu Feb 19, 2015 10:44 am; edited 1 time in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Thu Feb 19, 2015 8:59 am Post subject: |
|
|
Dan,
I am setting the environment variable in the batch file or at the cmd.exe prompt.
I did not get the compiler error response using Ver. 7.10.0, that you have reported.
When I run my profile_v6.exe program, for both gauss reduction and skyline solution, the program:
runs the gauss reduction
reports the results
runs the skyline reduction
produces the error during the report when attempting to write real*8 values.
At first it failed with "write (*,*) 'mflop =', mflop" so I replaced ",*)" with a format statement number and went further, but further into the reporting the use of format statement numbers also failed. It looks like there is a stack corruption when there are lots of FPE's. (looks can be deceiving!)
I am going to try the "call MASK_UNDERFLOW@" in the program and see if that changes the behaviour.
To test if it is related to the number of FP exceptions, I shall try a range of /d:n options which modifies the number of FPE's being generated.
It looks like my profile_v6.exe program is being slowed down by FPE's. I now need to test the FPE count in other programs that have slow DAXMY performance.
John |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Feb 19, 2015 1:07 pm Post subject: |
|
|
DanRRight wrote: "Only removing env.variable and rebooting computer helped...". If you are working from a command line, you can easily change the local environment with a one line statement. Changing the environment in the Windows System:Advanced:Environment panel makes the changes apply to all applications from then on, and is probably not a good idea. Furthermore, if you are working in a command window and change the environment in the control panel, the changes do not take effect in the command window.
When you think that you want to run programs with frequent changes to environment variables, as we did in this thread, it is safest to do so within a batch file which changes the environment as desired and then runs your program(s) in the changed environment. When your program is run from the batch file and quits, the batch file also quits and the old environment is "restored". Following this procedure also keeps you safe from the changed environment causing disruptions to subsequent compilations, as you experienced. You can have even finer control over changes to the environment within a batch file using the builtin SetLocal...EndLocal commands. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Thu Feb 19, 2015 4:52 pm Post subject: |
|
|
I cornered the problem of internal compiler crash in relatively short piece of code which probably Paul has to look at closely. because if we will start using masking routinely we will get eventually this problem. But first please confirm that you see the effect. The fortran file has two subroutines written 30 years ago which worked OK all the time. Together they produce compiler crash if underflows masked. If compiled separately they end up OK.
The ZIP file contains fortran code and batch file which compiles with and without masking underflows (thanks John and mecej4 for the hint)
When you run the batch it produces two files zzzMask and zzzNotMask where compilation process is logged. At the end of them you will see either compilation was OK or failed. Here is the link on Microsoft Onedrive website
Code: | https://onedrive.live.com/?cid=51A5F9B536E1368D&id=51A5F9B536E1368D!105 |
|
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Feb 19, 2015 5:31 pm Post subject: |
|
|
I can reproduce the compile time error. The error message came from the compiler rather than SALFLIBC.DLL.
I observe that in your program there is quite a bit of mixing of REAL*4 and REAL*8 variables and constants in expressions. For example, line 370 has
Code: | if(aIz_09.le.0d0) aIz_09 = 1.d-100
| This line of code does nothing useful if the previous value of aIz_09 is zero, because aIz_09 is a REAL*4 variable, and conversion of 1.d-100 to REAL*4 causes underflow. If underflow is masked, the conversion just yields zero. The right hand side expression in this statement is a constant and the compiler probably tried to convert and save the value as a REAL*4, since that is the type of the variable on the left hand side. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Thu Feb 19, 2015 7:15 pm Post subject: |
|
|
Thanks Mecej4, i think it was good explanation. Changing this variable to real*8 solved the problem.
The only what FTN95 missing in this case is to diagnose such things with exact position and clear message, not just crash usually in the remote place from the offending subroutine. That type of errors will be hard to anticipate by the programmer.
But have you noticed that if there is no second subroutine in the fortran file which is clean the error does not appear? How this can be explained? |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Feb 19, 2015 7:22 pm Post subject: |
|
|
I am glad that you were able to fix the problem based on my suspicions.
I agree that the compiler should issue a more helpful message, but often programs (including compilers) are not given the capability to respond to exceptions, and therefore they give give cryptic messages, or fail with not even a cryptic message, when they unexpectedly find themselves facing an exception.
The response of a compiler to a program with errors is often unpredictable.
Quote: | But have you noticed that if there is no second subroutine in the fortran file which is clean the error does not appear? | I could not understand that sentence. Will you please rephrase it? |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Thu Feb 19, 2015 8:34 pm Post subject: |
|
|
My wish always was that all recent generations of compilers must have to be way more verbose and suggestive. User-fiendly is another word. Specifically Fortran has to be very simple and user-friendly if we want it to be actively accepted by the new programmers
Returning to this error, hopefully that was right guess and not some other devilry . The question i asked in previous messages was this: the R.FOR file has two subroutines. The second is fine and has no problems. All the problems are in the first one. But the compiler crash only happens if both are in the same file like it is now.
So if i separate two these subroutines there is no compiler crash. Do you see that too and can you explain this devilry? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|