Topic: Win7/OpenGL/Listview/Nvidia crash in Support

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

11 Apr 2014 7:15 #13950

First of all, it is quite likely that Sebastian and Dan are having to deal with different issues so what might satisfy one may not satisfy the other.

Second, I have no real understanding of what is going wrong in either case.

However, using masked underflow should not, in my opinion, significantly affect the numerical results. If it does then this suggests that the numerical calculation is unstable (i.e. significantly affected by round-off error).

For 32 bit REALs, the smallest exponent is 1E-38 for unmasked underflow and 1E-46 for masked underflow. If you test a computed REAL z against zero then in one case you will be testing abs(z) < 1E-38 and abs(z) < 1E-46 in the other. Clearly this is not good programming practice and the compiler warns against this. Not only will the outcome depend on the masking but (more importantly) on the KIND of the REAL.

On another track, if using masked underflow does not help, then I have one other suggestion to try. It is based on the possibility that some third party computation (from OpenGL and/or the graphics card) is leaving the coprocessor in an unclean state. The following code, placed before the crash point, will reset the coprocessor...

     CODE
       fclex;
       finit;
       fround;
     EDOC

Again, this is just shooting in the dark since I don't have anything of substance to work on, not being able to reproduce the issues in question.

Sebastian

Posts: 177

Back to Top

11 Apr 2014 10:20 #13951

Quoted from DanRRight Sebastian, Post the example demonstrating what specifically this fix negatively impacts.

Anything that expects floats to get zero at a boundary (see fpu smallest sizes for non-denormals) will behave differently when reaching those numbers. If this is a problem for your code depends on whether you can exclude somebody was coding with these boundaries in mind or not.

Quoted from PaulLaidler Second, I have no real understanding of what is going wrong in either case.

So you cannot reproduce the issue? Get any win7-PC with a nVidia consumer graphics card, run the program, click on the 2nd window's list box entries and see the app crash.

Quoted from PaulLaidler If it does then this suggests that the numerical calculation is unstable (i.e. significantly affected by round-off error).

Any numerical calculation, any gui-to-data conversion, anything that expects numbers to pass a 'equal to zero' test with a non-denormal boundary will not work as it did before. This is unacceptable.

Quoted from PaulLaidler The following code, placed before the crash point, will reset the coprocessor...

This is dangerous code (the finit leaves the fpu in a state that may be different than what the compiler knows it should be). It does not fix the problem anyways, also I've already debugged the respective routines and the fpu state (cw and sw) are not touched by the graphics card drivers. The problem is most likely in some routines that deal with exception handling which are affected by the clearwin %og code.

I again may Point out that the callback works fine if it is called from a menu entry, from a button klick, but NOT (as in the sample) from a listview callback. I was hoping that this alone would lead some insight into what's going on in this case...

Thanks for any help!

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

11 Apr 2014 6:10 #13954

I have reproduced a failure when using %ls under Windows 8. I am hopeful that this is the same issue as both of the issued raised here and that an 'acceptable' fix will be available shortly.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

11 Apr 2014 9:10 #13955

Paul, Here is one more example when underflow crashes the code. This is your demo code for introduced recently mulththreading with FTN95:

https://forums.silverfrost.com/Forum/Topic/2239

I added to it only two lines which with time cause the crash as soon as the value of exponential function hits the underflow limit. You can play with the value 12.5 to see that crash started earlier or not happen at all

Adding call mask_underflow@() fixes this crashing problem (add the commented line)

module threadMod 
  c_external start_thread@    '__start_thread' (REF,REF):integer*4 
  c_external wait_for_thread@ '__wait_for_thread' (VAL) 
  c_external lock@            '__lock' (VAL) 
  c_external unlock@          '__unlock' (VAL) 
  integer,parameter::IO_LOCK = 42    !Any value. Your choice 
 contains 
  subroutine threadFunc(count) 
   integer count,start 
   if(count < 0) return  !Illustrates an abort 
   start = count 
   do while(count > 0) 
     call sleep1@(1.0) 
     call lock@(IO_LOCK) 

     print*, -12.5*count   !!!! added line
     A = exp(-12.5*count)  !!!! added line

     print*, 'threadFunc ', start, count 
     call unlock@(IO_LOCK) 
     count = count - 1 
   end do 
  end subroutine threadFunc 
 end threadMod 

 program Threads 
 use threadMod 
 integer hThread(3) 

!!! call mask_underflow@()  !  fixes underflow

 hThread(1) = start_thread@(threadFunc, 6) !Run for 6 seconds 
 hThread(2) = start_thread@(threadFunc, 3) !Run for 3 seconds 
 hThread(3) = start_thread@(threadFunc, 7) !Run for 7 seconds 
 call wait_for_thread@(hThread(2)) 
 call wait_for_thread@(0) 
 end

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

12 Apr 2014 10:23 #13957

Dan

This must be an unrelated issue. I don't know why masking fixes the problem. For the moment I will simply log this for investigation.

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

14 Apr 2014 9:16 #13961

I have uploaded a new version of salflibc.dll to www.silverfrost.com/beta/salflibc.exe

This provides a potential fix for the reported failure of %ls under Windows 7.

Sebastian: Does this fix the original issue on this thread?

Sebastian

Posts: 177

Back to Top

15 Apr 2014 12:54 #13962

Quoted from PaulLaidler This provides a potential fix for the reported failure of %ls under Windows 7.

Sebastian: Does this fix the original issue on this thread?

Yes it does, thanks for figuring this out!

Is the DLL safe to use in production code (v7.00 ftn95+the posted DLL)? I'd like to give it some testing with the application where the bug originally occurred.

Thanks again.

Sebastian

Posts: 177

Back to Top

5 Mar 2015 10:28 #15813

Quoted from PaulLaidler This provides a potential fix for the reported failure of %ls under Windows 7.

The fix seems to be specific to %ls, the following program (note it uses a `ls-style listbox) crashes.

winapp
program listbox
   implicit none
   external callback
   integer :: v, i
   character (len=*), dimension(3), parameter ::lvdata = &
     (/'|Data1     |', '|Data2     |', '|Data3     |'/)
   v = 1
   i=winio@ ('%og&', 100,100)
   i=winio@('%^`20.5ls',lvdata, 3, v, callback)
end program listbox

integer function callback()
   real :: f1,f2,f3
   callback = 2
   f1 = 1.e-28
   f2 = 1.e-29
   f3 = f1*f2
   f3 = f3*f3
end function

Also it seems that callbacks on radio buttons can trigger the crash in a similar way, any idea if the fix is somehow 'generally' applicable?

Thanks!

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

5 Mar 2015 3:53 #15814

I currently am running under 64 bit Windows 8.1 with an nVidia Geforce graphics card.

If you have a test program that crashes in this environment then hopefully I can make progress.

Sebastian

Posts: 177

Back to Top

6 Mar 2015 6:57 #15816

The code I've posted is the test program, compile with no special flags. I only have Win7 as platform available for testing though. On trying to choose a different list box entry, the test application hangs/crashes.

Thanks!

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

6 Mar 2015 8:07 #15817

This problem is 'fixed' if you call mask_underflow@() as the first executable statement in the main program. In the light of all the discussion about masking underflows, I am not sure that this is a good fix but it may serve your purpose.

Basically, Windows API calls (from ClearWin+) sometimes require a clean floating point stack (when underflows are unmasked which is the default).

A better solution might be to do the masking only for the Windows API dlls but this would require some tricky programming (for us and may not be possible for you). However, in theory you can handle the underflows and denormals yourself by using TRAP_EXCEPTION@ and CLEAR_FLOAT_UNDERFLOW@.

Sebastian

Posts: 177

Back to Top

10 Mar 2015 2:12 #15864

mask_underflow is not an option since it affects numerical results (discussed in this thread or elsewhere in this forum).

Basically, Windows API calls (from ClearWin+) sometimes require a clean floating point stack (when underflows are unmasked which is the default).

We've tried various clears already (FCLEX...) inside the callback, but there does not seem to be a problem inside the callback routine except for the generated underflow triggering a 'bad' exception handler that crashes the program. It strangely seems related to the %og control (which doesn't have any real functionality in the test case).

Any chance this and similar probmels are fixed for the next release?

Thanks for the help!

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

10 Mar 2015 2:37 #15867

Quoted from Sebastian

Any numerical calculation, any gui-to-data conversion, anything that expects numbers to pass a 'equal to zero' test with a non-denormal boundary will not work as it did before. This is unacceptable.

Not quite got the real dangers. Can you elaborate that on real example?

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

10 Mar 2015 6:19 #15869

Any chance this and similar problems are fixed for the next release

I will log this for investigation but a fix before the next release seems unlikely.

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

11 Mar 2015 7:55 #15872

There is an undocumented function that may be useful in this context. It resets the floating point stack (includes emitting finit and fround) and clears the exception record. It does not emit the fclex instruction.

This is how it is declared and called...

      c_external FPRESET@ '_fpreset'
      call FPRESET@()

Sebastian

Posts: 177

Back to Top

12 Mar 2015 10:37 #15882

Quoted from PaulLaidler There is an undocumented function that may be useful in this context.

Thanks, though I could not get it to avoid the crash. The fpu stack/cw/sw seems just fine in itself when looking at the registers and disassembler, it's the exception handler (which is outside of user code/availability) that crashes the application, and which seems to behave differently whether there's an %og control or not (the latter case doesn't crash at all...).

Not quite got the real dangers. Can you elaborate that on real example?

As I said you can construct any example where multiplying two small numbers results in a different value depending on the mask setting. Changing the behaviour of tested code in that way may or may not be a 'danger' to you.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

12 Mar 2015 12:04 #15884

Well, i can not construct such example which would be considered as a valid Fortran and not a programmer's fault intentionally touching gray area of denormals and asking for trouble, why i am asking for example.

Sebastian

Posts: 177

Back to Top

13 Mar 2015 10:03 #15891

Gradual underflow is beneficial or crucial for some types of algorithms. Feel free to read on http://www.cs.berkeley.edu/~wkahan/ARITH_17U.pdf and similar. This may or may not be an obvious or hidden problem in your code when changing the underflow mechanics.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

16 Mar 2015 9:09 #15901

It just confirmed my point. All the examples supplied there had some (very small, completely negligible) value in the past when single and double precision arithmetics was way different in CPU speed. Last thing - double requirement for the memory size of double precision values - will die soon too when Silverfrost makes 64 bit compiler with its eventually 4 billion times larger memory limits.

Sebastian

Posts: 177

Back to Top

17 Mar 2015 11:19 #15908

The relevance is not regarding speed or memory consumption but numerical precision. As I said if it's not a problem in your code then that's fine, I can't state the same for huge amounts of legacy code that may be even ever-unnoticedly use the properties of denormals' precision. Either way if you really want to continue the discussion please open a new thread and try not to derail this one which is about a pretty obvious and reproducable bug with default compiler settings that we'd really like to get fixed rather than worked around since it hits us every now and then pretty hard. Thanks.