Topic: Problems with debugger in Support

KL

Posts: 155

Back to Top

7 Feb 2011 4:18 #7719

Paul,

I have a severe problem with the Silverfrost Win32 Debugger. I am running a very large code with a specific data case. Using just the compiler with standard options gives correct results. However, when using

/checkmate /Full_Debug

the code crashes with the error message

“Error 429, Internal Error: stack pointer corrupt”

I have tried for days to find an error but I could only localize, that obviously –according to the debugger message- an error occurs in an internal subroutine consisting of a single line

dx (1:n) = - MatMul ( B_m1 (1:n, 1:n), F_new (1:n) )

(see the full subroutine below). The two arrays used as input to matmul are defined by the hosting subroutine, where they are dummy arguments. In fact they are array sections defined by the calling subroutine. The bounds reported by the debugger are B_m1 (3919408,356) and F_new (1:92861985) instead of the correct bounds (1:40,1:40) and (1:40), respectively. The elements are – again according to the debugger- either marked as “defined”, as “undefined” or as “Illegal pointer”. None of the numbers reported are correct. The resulting dx is completely wrong.

I have tried to check this internal subprogram by additional output:

Subroutine SolutionA1_dx                                          
                                                                  
  Implicit None                                                   
                                                                  
  write (*,*) Allocated (B_m1)                                    
  write (*,*) Allocated (F_new)                                   
  write (*,*) 'lower bound index 1= ', lbound (B_m1,1)            
  write (*,*) 'lower bound index 2= ', lbound (B_m1,2)            
  write (*,*) 'upper bound index 1= ', ubound (B_m1,1)            
  write (*,*) 'upper bound index 2= ', ubound (B_m1,2)            
  write (*,*) 'lower bound index 1= ', lbound (F_new)             
  write (*,*) 'upper bound index 1= ', ubound (F_new)             
  write (*,*) B_m1 (1:n, 1:n),                                    
  Write (*,*) F_new (1:n)                                         
                                                                  
                                 -1                               
  --- Solution of equation dx = -B  * F  (Eq. 9.7.19, p. 382)     
                                                                  
  dx (1:n) = - MatMul ( B_m1 (1:n, 1:n), F_new (1:n) )            
                                                                  
  Write (*,*)                                                     
  Write (*,*)                                                     
  Write (*,*) 'n  = ', n                                          
  Write (*,*)                                                     
  Write (*,*) 'dx = ', dx (1:n)                                   
  stop                                                            
                                                                  
End Subroutine SolutionA1_dx

The output gives completely correct results. The arrays exist, the bounds as well as the numbers are correct.

There is complete disagreement between the output and the debugger results.

I have further moved the statement from the internal subroutine to the hosting subroutine. Again I have tested by output the correct definition of bounds and values of the two input fields to matmul as well as the array dx. Everything is correct. The debugger reports the input arrays with correct values but wrong upper bound: 41 instead of 40. The output of dx reported is completely wrong again.

Has such a behavior been observed elsewhere and what further steps could I do?

Best regards,

Lassmann

Robert

Posts: 450 Manchester

Back to Top

7 Feb 2011 5:44 #7721

Does your program crash in the same place without using the debugger? In other words does just using the debugger change the behaviour of your program?

KL

Posts: 155

Back to Top

7 Feb 2011 5:50 #7722

Robert, without the debugger the program does not crash and gives correct results. Klaus

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

7 Feb 2011 6:48 #7724

I have noticed that the debugger sometimes shows incorrect array bounds. This may not be the fault of the debugger itself but incorrect or insufficient data being passed to the debugger. Also I think that some recent fixes may have reduced the occurence of this type of error.

The problem here seems to relate to a change in the internal assembly code between the different debugging modes.

A work around would be to compile the part that causes problems within a separate file or routine that does not use the debugging options and to use the debugger and debugging options only in those parts of the code that work OK.

I could investigate the specific problem given a short working program and data to go with it. That is, a short program that illustrates the fault.

Robert

Posts: 450 Manchester

Back to Top

7 Feb 2011 11:26 #7725

So, are you saying that '“Error 429, Internal Error: stack pointer corrupt”' is coming out of the debugger itself? sdbg does not change anything in the program being executed - at least as far as it would make any difference.

KL

Posts: 155

Back to Top

8 Feb 2011 9:23 #7726

From the additional output it seems that the array bounds are correct but are not correctly reported by the debugger. This could indicate a problem with the debugger.

As I said already, I have looked for days for uncorrect formulations and especially for side effects but I could not find anything. Normally I test sub-problems within a special test environment and this has been done here also extensively.

During the last days I have followed various routes of bracketing the source of the error. One was to carefully check all array sections and how they are passed to subroutines. Up to now I was unable to construct any case where an array section, passed to a subroutine, was not treated correctly in the subroutine. Fortunately!

Paul, you say that the previous debugger version could give incorrect array bounds (probably only in very specific situations). Could it be possible that the crash and the corresponding error message was the result of incorrect array bounds known to the debugger? This is only a question for ruling out a specific possibility and does not mean that I exclude a complicated or even trivial error somewhere in my code! I consider this of course still is a likely possibility.

In this context I have another question: is it possible that overloading causes a problem for the debugger? Actually, in the present case I overload the intrinsic function matmul by a module procedures for some specific reasons with 3 own double precision functions. Again, I could not manage to write a test case showing any problems.

I will be in holiday for about two weeks and will then continue with the analysis of this problem. One question to the specialists: Is it possible to find more out by a detailed analysis of what the debugger gives in the case of a crash? I cannot understand anything of the crytic output but I could send a detailed screen shot as a *.doc file.

Paul, you write that you have managed to improve the debugger for some known problems. When could you make this version available?

Best regards,

Klaus

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

8 Feb 2011 10:48 #7727

In short, the debugger does not always work even when the code is valid syntacticlly and logically. This is usually not the fault of the debugger itself but rather because of a small number of remaining bugs in the internal debugging code provided by the compiler. These can only be found and fixed on the basis of customer reports and sample programs.

Incorrect array bounds reported to the debugger might cause it to crash in some situations.

The fixes in this context are already in the current release. To my knowledge no changes have been made in the mean time that would affect this.

JohnCampbell

Posts: 2526 Sydney

Back to Top

9 Feb 2011 11:27 #7730

Klaus,

Try removing '-' from the intrinsic function.

dx(1:n) = MatMul ( B_m1(1:n, 1:n), F_new(1:n) ) dx(1:n) = - dx(1:n)

There was a problem with - before and this may be a repeat that problem. I don't recall if it was limited to the Debugger.

If that doen't work, replace MatMul with a DO loop and see if the problem disappears.

  do i = 1,n
    s = dot_product (B_m1(i, 1:n), F_new(1:n) )
    s = dot_product (B_m1(1:n, i), F_new(1:n) )   ! if B_m1 is symmetric
    dx(i) = -s
  end do

Hopefully you will remove the problem.

John

KL

Posts: 155

Back to Top

9 Feb 2011 12:35 #7731

Thank you John! I will try your first proposal after my holiday. Concerning your second proposal I would argue that there are several other possibilities to circumvent the problem. However, the problem should not arise and I would like to contribute to find out why. Unfortunately I was -up to now- not able to show the specific problematic in a small code. The only thing I found is a strange (may be meaningless) compiler warning for the following subroutine. The warning is

        NO ERRORS  [<SUB1> FTN95/Win32 v6.00.0]
0045)     Function MAVB (a, b )
WARNING - In a call to MAVB from another procedure, the first argument was of 
    type INTEGER(KIND=3), it is now REAL(KIND=2)
        NO ERRORS, 1 WARNING  [<MAVB> FTN95/Win32 v6.00.0]

The subroutine is:

  Subroutine Broyden ( Vec , Mat )

      Use      DataTypes
      Implicit None

      Real    (dp) , Dimension (:)   , Intent (inout) :: Vec
      Real    (dp) , Dimension (:,:) , Intent (inout) :: Mat

      Real    (dp) , Dimension ( size ( Vec) )        :: dx
      Integer (i4b)                                   :: m, n

    Interface
        Function MAVB (a, b )
          Use DataTypes
          Real (dp), Dimension (:,:), Intent (in)  :: a
          Real (dp), Dimension (:  ), Intent (in)  :: b
          Real (dp), Dimension ( Size ( a,1 )  )   :: MAVB
        End Function MAVB
    End Interface

!     --------------------------------------------------------------------

        m = size (Mat,1)
        n = size (Mat,2)

        Call Sub1

        Vec (1:n) = dx (1:n)


! ########
  Contains
! ########
 
 
!   --------------------------------------------------------------------

    Subroutine Sub1 
 
      dx (1:n) = - MAVB ( Mat (1:n, 1:n), Vec (1:n) )

    End Subroutine Sub1 

    Function MAVB (a, b )
 
!     Matrix A * Vector B
!     =      =   =      =

      Use DataTypes
      Implicit None
      Real (dp), Dimension (:,:), Intent (in)  :: a
      Real (dp), Dimension (:  ), Intent (in)  :: b
      Real (dp), Dimension ( Size ( a,1 )  )   :: MAVB
      Real (xp), Dimension ( Size ( a,1 )  )   :: c

      Integer                                  :: i, k

      Do i=1,m
        c(i) = 0.0d+00
        Do k=1,n
          c(i) = c(i) + a(i,k)*b(k)
        End Do
      End Do

      MAVB (1:m) = c (1:m)

    End Function MAVB

  End Subroutine Broyden

After my holiday I will check whether actually a wrong declaration is being used in subroutine Sub1 for the first argument MAT which is host associated and should be REAL (KIND=2).

Many thanks again,

Klaus

JohnCampbell

Posts: 2526 Sydney

Back to Top

9 Feb 2011 11:29 #7733

Klaus,

I think I see where you are going with this. My preference is for a simpler approach, such as: Subroutine VecMat ( Vec , Mat )

      Real*8, Dimension (:)   , Intent (inout) :: Vec 
      Real*8, Dimension (:,:) , Intent (in)    :: Mat 
!
      Real*8, Dimension ( size (Vec) )         :: dx
      Real*10                                  :: s 
      Integer*4                                :: i, k
!
      if ( size (Mat,1) /= size (Vec) .or.  &
           size (Mat,2) /= size (Vec) )     &
        Write (*,*) 'Inconsistent matrix dimension for VecMat'
!
      Do i = 1,size (Vec)
        s = 0
        Do k = 1,size (Vec) 
          s = s - Mat(i,k)*Vec(k) 
        End Do 
        dx(i) = s
      End Do 

      Vec = dx

  End Subroutine VecMat

Or even Subroutine Brief ( Vec, Mat, n ) ! Integer4 n Real8 Vec(n), Mat(n,n) ! Real8 dx(n) Real10 s Integer*4 i, k ! Do i = 1,n s = 0 Do k = 1,n s = s - Mat(i,k)*Vec(k) End Do dx(i) = s End Do

      Vec = dx

  End Subroutine Brief

John

KL

Posts: 155

Back to Top

19 Feb 2011 1:19 #7800

Paul,

I had a further look to the problem and managed to produce a little test program that demonstrates the effect. However, before giving further details I must admit that in the program I submitted on February 9th, there is an error which explains the “strange” compiler message: There must be no interface! I believed (wrongly) that an interface even for an internal subroutine would not harm.

After bracketing the error for further days I came to the conclusion that this error has nothing to do with overloading but simply with a function, which in my case performs a matrix multiplication. (One word to John Campbell: please do not send further suggestions for improvements. This is just a test program.)

Writing a subroutine instead of the function solves the problem, which clearly indicates that the error has something to do with the function.

Here are the programs:

Main Program:

! ##############
  Program Test66
! ##############

   Implicit None

    Interface
      Subroutine SubA ( MatA, MatB, MatC )
        Implicit None
        Real , Dimension (:,:) , Intent (in)  :: MatA, MatB
        Real , Dimension (:,:) , Intent (out) :: MatC
      End Subroutine SubA
    End Interface

!  ----------------------------------------------------------------------------

   Real , Dimension (10,10,5,5) :: MatA, MatB, MatC

   Integer                      :: M, n, r, nwrite

   nwrite = 12
   Open (Unit = nwrite, File = 'Test66.out' )

   m = 3
   n = 2
   r = 1

   MatA (1:m,1:n,1,1) = 0.1d+00
   MatB (1:n,1:r,1,1) = 0.1d+00

   Call SubA ( MatA (1:m,1:n,1,1), MatB (1:n,1:r,1,1), MatC (1:m,1:r,1,1) )

   write ( nwrite, * ) 
   write ( nwrite, * ) MatA (1:m,1:n,1,1)
   write ( nwrite, * ) MatB (1:n,1:r,1,1)
   write ( nwrite, * ) MatC (1:m,1:r,1,1)

 
! ##################
  End Program Test66
! ##################

Subroutine SubA:

  Subroutine SubA ( MatA, MatB, MatC )

      Implicit None
      Real , Dimension (:,:) , Intent (in)  :: MatA, MatB
      Real , Dimension (:,:) , Intent (out) :: MatC
      Integer                               :: m, n, r

      m = size (MatA,1)
      n = size (MatA,2)
      r = size (MatB,2)

      MatC (1:m,1:r) = - MAMB ( MatA (1:m,1:n), MatB (1:n,1:r) )

  Contains
 
      Function MAMB ( a, b )

!       Matrix A * Matrix B
!       =      =   =      =

        Implicit None
        Real , Dimension (:,:), Intent (in) :: a
        Real , Dimension (:,:), Intent (in) :: b
        Real , Dimension (m,r)              :: MAMB
        Real , Dimension (m,r)              :: c
        Integer                             :: i, j, k
 
!       ---------------------------------------------------------------------

        m = Size ( a,1 )
        n = size ( a,2 )
        r = Size ( b,2 )

        Do i=1,m
          Do j=1,r
            c(i,j) = 0.0
            Do k=1,n
              c(i,j) = c(i,j) + a(i,k)*b(k,j)
            End Do
          End Do
        End Do

        MAMB (1:m,1:r) = c (1:m,1:r)

        End Function MAMB

  End Subroutine SubA

Batch file for running:

del comp.lis
del Link.lis
del *.obj
del *.mod
del *.exe

ftn95 Test66.f95    /Checkmate /Full_Debug /dump /list >> comp.lis
ftn95 SubA.f95      /Checkmate /Full_Debug /dump /list >> comp.lis

slink LinkFile                                         >> Link.lis
  
sdbg Test66.exe

Link File:

LOAD Test66
LOAD SubA
file Test66.exe

Error message:

Error 429, Internal Error: Stack pointer corrupt

I have checked carefully the program. Nevertheless, I cannot exclude an error somewhere. The interesting aspect is that this program runs without problems and gives correct results when run without /Checkmate /Debug. However, I have not looked after consequences of combinations of compiler options.

My main question is whether there is an error in my code. If not, could this test program help finding the source of the problem?

Best regards,

Klaus

KL

Posts: 155

Back to Top

19 Feb 2011 1:25 #7801

I have checked my contribution and the preview was OK. However something was cancelled. I try to send the rest:

Batch file:

del comp.lis
del Link.lis
del *.obj
del *.mod
del *.exe

ftn95 Test66.f95    /Checkmate /Full_Debug /dump /list >> comp.lis
ftn95 SubA.f95      /Checkmate /Full_Debug /dump /list >> comp.lis

slink LinkFile                                         >> Link.lis
  
sdbg Test66.exe

Linkfile:

LOAD Test66
LOAD SubA
file Test66.exe

Error message Error 429, Internal Error: Stack pointer corrupt

I have checked carefully the program. Nevertheless, I cannot exclude an error somewhere. The interesting aspect is that this program runs without problems and gives correct results when run without /Checkmate /Debug. However, I have not looked after consequences of combinations of compiler options.

My main question is whether there is an error in my code. If not, could this test program help finding the source of the problem?

Best regards,

Klaus

KL

Posts: 155

Back to Top

22 Feb 2011 5:46 #7817

Paul,

did you find the time to reproduce the error message, or do you see something in my code that worries you?

Klaus

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

23 Feb 2011 3:52 #7829

I have had a quick look at this and my first impression is that the compiler is not able to handle the matrix assignment when using /checkmate.

This means that the compiler code that provides the undef checking is wrong on this line (line 12 of the subroutine suba)

So if you get the correct results using /checkmate for the main program and /check for the subroutine then all well and good.

In the mean time I will note this as a bug to fix.

KL

Posts: 155

Back to Top

23 Feb 2011 4:56 #7830

Paul,

thank you very much. I can continue this thread (if necessary at all) only mid of March.

Klaus

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

3 Mar 2011 10:48 #7866

I have fixed the bug that causes the stack pointer corruption but I am not sure if I can include the fix in the new release that is expected soon.

The fix only relates to the stack pointer error. I have not fixed the fact that the debugger does not display the correct array bounds. As I mentioned before, this may relate to the code generated by FTN95 rather than a fault with the debugger itself.

KL

Posts: 155

Back to Top

14 Mar 2011 5:13 #7917

Thank you very much. However, correct array bounds are of large importance. Klaus