Silverfrost Forums

Welcome to our forums

Situation where /check cannot be used

10 Oct 2022 8:00 #29418

In a medium sized program (www.netlib.org/odepack) that I am attempting to modernize, I have reached what appears to be a stumbling block.

The program is over 31,000 lines of code, about 1.3 Mbytes in all, including nine test drivers and three files with the ODE solver routines, all Fortran 77 with lots of common blocks, etc. With the help of FTN95 I found a few instances of uninitialized variables, and fixed those issues.

With one of the test problems, I suspect that there are arrays being accessed out of bounds, but I am unable to use FTN95 8.91 with /check to analyse the problem, since the program stops early with a spurious error.

The short program that I give below contains the gist of the issue. A procedure (in this case, a subroutine) is passed as an argument to a subroutine, which passes it along to one or more other subroutines. No interfaces are provided and implicit typing is relied upon, since adding declarations and interfaces to the Odepack program would require considerable effort.

program impasse
   external fsub             ! fsub is intended to be a subroutine
   real x,y

   y = 25.0
   x = 4.0                   !trial value of root
   call solve(x,y,fsub)
   print *,'Root is ',x

end program

subroutine solve(x,y,fsub)   ! this subroutine is just an intermediary
   real x,y                  ! the arg FSUB is passed along
   external fsub             ! is fsub a function or a subroutine?

   call find_root(x,y,fsub)
   print *,'Solve found root as ',x
   return

end subroutine solve

subroutine find_root(x,y,fsub)
   real x,y
   real x1,x2,y1,y2
   integer iter

   iter = 0
   do
      x1 = x
      call fsub(x1,y1)
      x2 = x*1.01
      call fsub(x2,y2)
      der = (y2-y1)/(x2-x1)
      x = x1 - (y1-y)/der
      print *,x,y1,y2
      if(abs(y1-y) < 1e-4)exit
      iter = iter+1
   end do
   return

end subroutine

subroutine fsub(x,y)
   real x,y

   y = x*x
   return

end

This program runs without error when built with these compilers: Compaq, Gfortran, Lahey, Intel and NAG. The small test program is, I believe, error-free.

When built and run with FTN95 8.91 with /check, the program stops early:

Runtime error from program:s:\lang\ftn95\extbug.exe
Run-time Error
Attempt to call a routine with argument number three as a procedure when a real(kind=1) was required

 SOLVE -  in file extbug.f90 at line 10 [+005c]
 main -  in file extbug.f90 at line 7 [+006c] 
10 Oct 2022 8:59 #29420

Thank you for this bug report.

I can confirm that the runtime checking mechanism gives a false error report in this instance and that this needs to be fixed. In the meantime a failure in this context can be avoid by not using /check or by adding /inhibit_check 5 (or 6) to the command line arguments.

For other readers (not mecej4), it is possible that the other compilers do not provide such detailed runtime checking. FTN95 is generally very good at spotting runtime failures that other compilers fail to check but there may be occasional false reports that should be reported.

10 Oct 2022 12:10 #29426

Thanks for the suggestion, Paul, and it may be noted that the current version of the FTN95 help file documents what /inhibit_check <nn> signifies for nn up to 20.

Either 5 or 6 as the argument to /inhibit_check works on the small test program that I posted above, allowing more of the code to be examined in the debugger.

However, the same technique did not work for the Odepack program, understandably so because that program is very complex. For the 'demo6' case, building and running with /inhibit_check 5 allowed progress to beyond where I had to stop earlier, but the program terminated with

Integer arithmetic overflow at address 469afb

Within file prb6.exe
in DSTOKA in line 7845, at address 3a80
in DLSODKR in line 10530, at address 5500
in MAIN in line 111, at address bc6

That line 7845 is just

      IF (NQ .EQ. 1) GO TO 560

and in the debugger NQ is shown to have the value 2 (in the Variables pane or by hovering the mouse over the variable in the Code pane).

Retrying with /inhibit_check 6, I obtained a puzzling traceback after the program stopped:

Attempt to call a routine with one thousand four hundred and fourteen arguments when five were required at address 1a0093e9

Within file prb6.exe
in DCOPY at address 26c
in JACBD in line 303, at address 853
in DSETPK in line 7970, at address 72f
in DSTOKA in line 7657, at address 1c42
in DLSODKR in line 10531, at address 45d6
in MAIN in line 111, at address 9b4

There are no subprograms with more than 27 arguments anywhere in the body of the code. DCOPY is a standard BLAS routine, and it has only five arguments and itself calls no other routines.


For the record, I wish to state that FTN95 continues to be the best tool for my work in updating and testing Fortran code for Numerical Analysis applications from Netllib and elsewhere. Some of the codes are in F95/F90, but most are F77 and some are as old as F64. Silverfrost Fortran is, for my purposes, the most capable and convenient package bar none. Other compilers may provide capabilities that FTN95 does not (e.g., OpenMP debugging) or may accept code that conforms to Fortran 2003 and later standards. When it comes to debugging, if FTN95/SDBG fail, for me it is time to post a bug report in this forum or to give up for now if I cannot create a small test program to reproduce the issue.

10 Oct 2022 2:42 #29427

mecej4

Have you presented enough information for us to reproduce these other false runtime errors?

10 Oct 2022 3:20 #29428

Quoted from PaulLaidler mecej4

Have you presented enough information for us to reproduce these other false runtime errors?

Not at all so far, Paul. I hesitated to send you a 30,000 LOC program. When the program has bugs, some known and others suspected, and the compiler does not diagnose these errors correctly, how do I assign blame and remain credible?

I decided tentatively to have a go at cutting down the program while preserving the run time misdiagnosis by the compiler, before sending it to you.

It will take me some time to do that.

10 Oct 2022 7:27 (Edited: 12 Oct 2022 11:21) #29429

Paul, the following links refer to two Zip files with which you can reproduce the two errors that I described earlier. Both contain source files that can be traced back to the Netlib Odepack sources, but with major revamping and paring.

https://www.dropbox.com/s/f8lus7t6whzt4p0/cull5.zip?dl=0

https://www.dropbox.com/s/g1rxd8ee94yy5fz/cull6.zip?dl=0

Please extract each zip in its own directory. In each directory, running the batch file bld.bat will produce an EXE. Running that EXE should lead to an error stop that matches the contents of the file error.txt in the directory.

Cull5 demonstrates the error that results with /inhibit_check 5. Cull6 works similarly for /inhibit_check 6.

These cut-down programs run to about 2000-3000 lines of Fortran code.

Thanks.

11 Oct 2022 6:14 #29431

mecej4

Many thanks for the valuable feedback.

13 Oct 2022 10:44 #29437

The original issues on this thread has now been fixed for the next release of FTN95 and the associated DLLs.

14 Oct 2022 1:05 #29442

Thanks for the rapid bug fix, Paul.

Is it problematic to specify more than one /inhibit_check clause?

When I build and run the program given below with

ftn95 /check pgm.f /inhibit_check 5 /inhibit_check 6 /link

and run, the program aborts, saying 'attempt to access undefined argument to routine'.

The test program:

program abc
integer a,b,c
a=2
b=3
call sub(a,b,c)
print *,a,b,c
end

subroutine sub(a,b,c)
integer a,b,c
c=a+b
end
14 Oct 2022 6:26 #29443

mecej4

From your example it looks like using 5 and 6 together can lead to false runtime error reporting.

These two checks are similar and apparently can overlap. Hopefully this is the only overlap that could cause a problem.

It ought to be a simple matter to get FTN95 to filter out this combination.

25 Oct 2022 8:57 #29493

mecej4

I have looked at cull5.

I think that the problem is in the code rather than the checking mechanism.

dstoka is called many times and the checking works correctly up to the 112th call at which point I assume that the checking 'strip' has been over-written in some way by the program.

On the 112th call to dstoka the program goes into a loop and (in check mode) I get an integer overflow on lne 743. yh is an assumed size array and it appears that its bounds are not computed correctly.

The lower case L in yh(1, l) might be a programming error.

I could explore further but the main point is that I think that FTN95 is behaving correctly.

25 Oct 2022 10:36 #29495

mecej4

I have looked at cull6.

The program runs without failure when using /check on its own but false runtime errors are reported when /inhibit_check 5 (and or 6) are added to /check.

We have already noted that there can be problems when using 5 with 6 but now it appears that there are occasions when using either or both can give false runtime errors.

In the short term please note that 5 and 6 can give false runtime errors in some contexts.

25 Oct 2022 2:39 #29496

mecej4

The failure for cull6 has now been fixed for the next release of FTN95. Also the earlier problem encoutered when using 5 with 6 has been revisited.

In future, when using /check with /inhibit_check, 5 and 6 can be used together, and either on its own implies the other.

25 Oct 2022 5:26 #29497

Thanks, Paul.

I noticed just now that new DLLs and the 8.92 version of the compiler have been made available.

After I download the new versions and rerun the test problems, I hope to find that the problems that I reported have been fixed. I also realise that when a one-dimensional array section is passed as an actual argument to a subroutine that expects a 2-D assumed-size array argument, it may not be possible to check for subscript errors in the subroutine.

25 Oct 2022 7:45 #29498

mecej4

Just to be clear, the fix for cull6 was made after the release of v8.92.

25 Oct 2022 11:25 #29500

I was beginning to ask myself if that was the case (fixes made post 8.92 release), thanks.

Please login to reply.