forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Slow handling of FPU underflow interrupts in SALFLIBC.DLL
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Wed Feb 18, 2015 1:40 pm    Post subject: Slow handling of FPU underflow interrupts in SALFLIBC.DLL Reply with quote

Here is a small code that displays the huge slowdowns caused by underflow interrupts occurring in DAXPY type code in Fortran. These slowdowns occur when underflow exceptions are handled in the default mode of code compiled with FTN95. Most of the issues were written up in another long thread, see http://forums.silverfrost.com/viewtopic.php?t=2465&postdays=0&postorder=asc&start=0, but here is a compact test code.
Code:

program xundfl
implicit none
integer, parameter :: NMAX=2000, NREP=50000
double precision, dimension(NMAX) :: X,Y,Z
double precision A,t1,t2
integer :: i,k,n,cnt1,cnt2

open(15,file='vecsub.bin',form='unformatted',status='old')
read(15)n,a,(x(i),y(i),i=1,n)
close(15)

call dclock@(t1)
call underflow_count@(cnt1)
do k=1,NREP
   do i = 1,n
      z(i) = y(i) - a*x(i)
   end do
end do
call underflow_count@(cnt2)
call dclock@(t2)
write(*,10)n,t2-t1,(cnt2-cnt1)/NREP
10 format('Vectors of size ',I4,2x,' time = ',f6.3,' ufls = ',i4)
end program

The data file for the test code is an unformatted Fortran data file that contains two double precision vectors x and y of length n =1679, and the test program performs the operation z = y - a x fifty thousand times. You can download the file (27 KB) from the public link https://dl.dropboxusercontent.com/u/88464747/vecsub.bin .

Compile with /opt /p6 and link. Run the program and record the output. Set the environment variable SALFENVAR=MASK_UNDERFLOW, and run again. You will see something similar to the following:
Code:

s:\lang\JCampBell\SAL>set SALFENVAR=

s:\lang\JCampBell\SAL>xundfl
Vectors of size 1679   time = 33.860 ufls =   88

s:\lang\JCampBell\SAL>set SALFENVAR=MASK_UNDERFLOW

s:\lang\JCampBell\SAL>xundfl
Vectors of size 1679   time =  0.688 ufls =    0


From these results we can estimate the time spent per one execution of the FPU underflow interrupt handler to be the equivalent of about 12,000 CPU cycles, which agrees with a similar estimate made in John Campbell's thread linked above, where the program can take hours to run with the default underflow processing versus about 20 seconds with SSE2 code.


Last edited by mecej4 on Wed Feb 18, 2015 3:33 pm; edited 1 time in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Wed Feb 18, 2015 3:28 pm    Post subject: Reply with quote

Here is an even simpler example, for those who may be a bit worried about downloading a binary file from an "unknown" person. This example does not need a data file, and does not do a DAXMY operation.
Code:

module globvars
logical :: uflseen = .false.
integer :: count = 0
end module

program testunderflow
use globvars
implicit none
integer i,k
!
real :: x,y,dx,s
double precision t1,t2,c1,c2
!
call dclock@(t1)
c1=cpu_clock@(c1)
do k=1,200000
   x=1e-30
   y=x+1e-36
   s=0
   do i=1,20
      dx=y-x
      s=s+dx
   !   write(*,*)i,dx
      y=y*0.5d0
      x=x*0.5d0
   end do
end do
call dclock@(t2)
call underflow_count@(count)
write(*,10)'count = ',count,' time = ',t2-t1
c2=cpu_clock@()
write(*,20)'CPU cycles used = ',c2-c1,',  cycles/underflow = ',(c2-c1)/count
10 format(1x,A,I20,3x,A,2x,F8.3,' s')
20 format(1x,A,EN10.2,A,F8.0)
end program

Make two runs, one with SALFENVAR set to MASK_UNDERFLOW, and one without.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Wed Feb 18, 2015 5:19 pm    Post subject: Reply with quote

Interesting. Why

call mask_underflow@()

does not do the same thing?
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Wed Feb 18, 2015 6:53 pm    Post subject: Reply with quote

Exception handling is an arcane and complicated art. Interrupts/traps do not follow the orderly flow of logic in a program. If an exception is unmasked by calling a service routine within the program, the exception flags have to be restored before calling a library routine and when that routine returns. There may be OS conventions about saving exception control registers, setting your desired values and restoring the old values at subprogram return or other places. Conventions are also needed as to who (the user program or library routine that caused the trap, the OS, the default signal handler or the user-supplied replacement handler) is responsible for doing the saves and restores.

Finally, when your exception handler decides to increment a count of , say, underflow, it has to supply a reasonable value (such as zero) for the result of the expression whose evaluation caused the exception, return to the point in the user program or library routine where the exception was taken, and restore the program state; moreover, this cannot be done directly but through the OS and the compiler's runtime, and requires facilities and cooperation from the OS.

A scholarly article about this topic: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_handle.html .
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Wed Feb 18, 2015 11:49 pm    Post subject: Reply with quote

One more interesting thing: when i changed env.variables as above the small code above was compiled OK but compilation of my own ones started producing this error

*** Internal compiler error - floating point exception

Only removing env.variable and rebooting computer helped...
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Thu Feb 19, 2015 12:11 am    Post subject: Reply with quote

Interesting.

The compiler itself depends on Salflibc.DLL. If you have a constant expression of type REAL in your program, the compiler may invoke FPU routines in the DLL to evaluate them once for all, and put the resulting value into the OBJ and EXE. If the code in the DLL checks the environment variable, and the evaluation of the expression causes an FPU exception, and the compiler expects exceptions to be masked and handled in the default way, it may get stuck.

I am speculating, of course, since I have no real information about the innards of the DLL.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Feb 19, 2015 1:47 am    Post subject: Reply with quote

Mecej4,

Thanks very much for the simplified example of the problem.

I enhanced the first example above to:
include the use of MASK_UNDERFLOW@, and
"estimate" the delay due to floating point exception handling

the revised program is:
Code:
program xundfl
 implicit none
 integer, parameter :: NMAX=2000, NREP=50000
 double precision, dimension(NMAX) :: X,Y,Z
 double precision A,t1,t2
 integer   :: i,k,n,cnt1,cnt2, pass, nfpe
 real*4    :: op_count, mflops
 real*10   :: c1,c2

 open (15,file='vecsub.bin',form='unformatted',status='old')
 read (15) n,a,(x(i),y(i),i=1,n)
 close(15)
 write (*,*) 'n =',n

 do pass = 1,2
    write (*,*) ' '
    if (pass==2) then
      write (*,*) 'Mask Underflow enabled'
      call MASK_UNDERFLOW@
    end if
   
    call dclock@ (t1)                ! elapsed time (seconds)
    c1 = cpu_clock@ ()               ! elapsed time (CPU cycles)
   
    call underflow_count@(cnt1)
    do k=1,NREP
       do i = 1,n
          z(i) = y(i) - a*x(i)
       end do
    end do
    call underflow_count@(cnt2)
    call dclock@ (t2)
    c2 = cpu_clock@ ()
!
    c2       = c2 - c1                     ! clock cycles
    t2       = t2 - t1                     ! elapsed seconds
    nfpe     = cnt2 - cnt1                 ! number of FPE
    op_count = float(nrep) * float(n) * 2  ! floating operation count
    mflops   = op_count / t2 / 1.e6        ! million floating point operations per second
!
    write (*,10) n, t2, nfpe/NREP
    write (*,*) ' mflops =',mflops
    if ( nfpe > 0) &
    write (*,*) ' Floating point exception takes', c2/float(nfpe),' cycles'
 end do ! pass
 10 format('Vectors of size ',I4,2x,' time = ',f6.3,' ufls = ',i4)
 
 end program


The batch file I am using is:
Code:
ftn95 xundfl /link
set SALFENVAR=

xundfl

set SALFENVAR=MASK_UNDERFLOW

xundfl

ftn95 xundfl /p6 /opt /link
set SALFENVAR=

xundfl

set SALFENVAR=MASK_UNDERFLOW

xundfl


Unfortunately I an not generating the other reported errors
*** Internal compiler error - floating point exception
floating point stack fault in IO_convert_long_double_to_ascii

I am estimating the FPE takes about 22,000 CPU cycles to process !!
The next stage will be to get statistics on how frequently this is happening in real programs.

John

ps: I would not recommend the use of dclock@ (based on GetTickCount) as it is accurate to only 64 ticks per second.
The intrinsic System_Clock (based on QueryPerformanceCounter) is a more accurate timer with FTN95.
real*10 cpu_clock@ is based on RDTSC and typically provides the time in processor cycles.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Thu Feb 19, 2015 8:15 am    Post subject: Reply with quote

John
Are you getting the compiler crash too or not? I misunderstood.

I was changing env variable in ControlPanel/System/Settings and seems changing it back by removing the line did not work until computer reboot.

By the way if you remember it was these damn underflows which caused the crash of the running code if you do multitasking jalih' s or Paul's way. Masking them by calling Silverfrost's @ utility helped to avoid crashes, that was good suggestion by Paul.


Last edited by DanRRight on Thu Feb 19, 2015 10:44 am; edited 1 time in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Feb 19, 2015 8:59 am    Post subject: Reply with quote

Dan,

I am setting the environment variable in the batch file or at the cmd.exe prompt.

I did not get the compiler error response using Ver. 7.10.0, that you have reported.
When I run my profile_v6.exe program, for both gauss reduction and skyline solution, the program:
runs the gauss reduction
reports the results
runs the skyline reduction
produces the error during the report when attempting to write real*8 values.

At first it failed with "write (*,*) 'mflop =', mflop" so I replaced ",*)" with a format statement number and went further, but further into the reporting the use of format statement numbers also failed. It looks like there is a stack corruption when there are lots of FPE's. (looks can be deceiving!)

I am going to try the "call MASK_UNDERFLOW@" in the program and see if that changes the behaviour.
To test if it is related to the number of FP exceptions, I shall try a range of /d:n options which modifies the number of FPE's being generated.

It looks like my profile_v6.exe program is being slowed down by FPE's. I now need to test the FPE count in other programs that have slow DAXMY performance.

John
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Thu Feb 19, 2015 1:07 pm    Post subject: Reply with quote

DanRRight wrote: "Only removing env.variable and rebooting computer helped...". If you are working from a command line, you can easily change the local environment with a one line statement. Changing the environment in the Windows System:Advanced:Environment panel makes the changes apply to all applications from then on, and is probably not a good idea. Furthermore, if you are working in a command window and change the environment in the control panel, the changes do not take effect in the command window.

When you think that you want to run programs with frequent changes to environment variables, as we did in this thread, it is safest to do so within a batch file which changes the environment as desired and then runs your program(s) in the changed environment. When your program is run from the batch file and quits, the batch file also quits and the old environment is "restored". Following this procedure also keeps you safe from the changed environment causing disruptions to subsequent compilations, as you experienced. You can have even finer control over changes to the environment within a batch file using the builtin SetLocal...EndLocal commands.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Thu Feb 19, 2015 4:52 pm    Post subject: Reply with quote

I cornered the problem of internal compiler crash in relatively short piece of code which probably Paul has to look at closely. because if we will start using masking routinely we will get eventually this problem. But first please confirm that you see the effect. The fortran file has two subroutines written 30 years ago which worked OK all the time. Together they produce compiler crash if underflows masked. If compiled separately they end up OK.

The ZIP file contains fortran code and batch file which compiles with and without masking underflows (thanks John and mecej4 for the hint)

When you run the batch it produces two files zzzMask and zzzNotMask where compilation process is logged. At the end of them you will see either compilation was OK or failed. Here is the link on Microsoft Onedrive website

Code:
https://onedrive.live.com/?cid=51A5F9B536E1368D&id=51A5F9B536E1368D!105
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Thu Feb 19, 2015 5:31 pm    Post subject: Reply with quote

I can reproduce the compile time error. The error message came from the compiler rather than SALFLIBC.DLL.

I observe that in your program there is quite a bit of mixing of REAL*4 and REAL*8 variables and constants in expressions. For example, line 370 has
Code:
      if(aIz_09.le.0d0) aIz_09 = 1.d-100
This line of code does nothing useful if the previous value of aIz_09 is zero, because aIz_09 is a REAL*4 variable, and conversion of 1.d-100 to REAL*4 causes underflow. If underflow is masked, the conversion just yields zero. The right hand side expression in this statement is a constant and the compiler probably tried to convert and save the value as a REAL*4, since that is the type of the variable on the left hand side.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Thu Feb 19, 2015 7:15 pm    Post subject: Reply with quote

Thanks Mecej4, i think it was good explanation. Changing this variable to real*8 solved the problem.

The only what FTN95 missing in this case is to diagnose such things with exact position and clear message, not just crash usually in the remote place from the offending subroutine. That type of errors will be hard to anticipate by the programmer.

But have you noticed that if there is no second subroutine in the fortran file which is clean the error does not appear? How this can be explained?
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Thu Feb 19, 2015 7:22 pm    Post subject: Reply with quote

I am glad that you were able to fix the problem based on my suspicions.

I agree that the compiler should issue a more helpful message, but often programs (including compilers) are not given the capability to respond to exceptions, and therefore they give give cryptic messages, or fail with not even a cryptic message, when they unexpectedly find themselves facing an exception.

The response of a compiler to a program with errors is often unpredictable.
Quote:
But have you noticed that if there is no second subroutine in the fortran file which is clean the error does not appear?
I could not understand that sentence. Will you please rephrase it?
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Thu Feb 19, 2015 8:34 pm    Post subject: Reply with quote

My wish always was that all recent generations of compilers must have to be way more verbose and suggestive. User-fiendly is another word. Specifically Fortran has to be very simple and user-friendly if we want it to be actively accepted by the new programmers

Returning to this error, hopefully that was right guess and not some other devilry Smile. The question i asked in previous messages was this: the R.FOR file has two subroutines. The second is fine and has no problems. All the problems are in the first one. But the compiler crash only happens if both are in the same file like it is now.
So if i separate two these subroutines there is no compiler crash. Do you see that too and can you explain this devilry?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group