Silverfrost Forums

Welcome to our forums

Severe slowdown if pointers passed as actual arguments

9 Feb 2024 3:32 #31076

The software package HST3D developed by the USGS is used to model groundwater flows with coupled heat and salt transport. It has been widely used for decades, and is standard Fortran 95 with no dependence on external libraries. See https://wwwbrr.cr.usgs.gov/projects/GW_Solute/hst/index.shtml .

The HST3D source code contains several instances where a pointer to an array section is passed as an actual argument to a subprogram where the corresponding dummy argument is an assumed shape array. Such subprograms have explicit interfaces available to their callers, as required.

Here is an example of such usage.

INTEGER, PARAMETER :: kdp = SELECTED_REAL_KIND(14,60)
REAL(KIND=kdp), DIMENSION(:), ALLOCATABLE, TARGET :: rhs
REAL(KIND=kdp), DIMENSION(:), ALLOCATABLE :: envlra
REAL(KIND=kdp), DIMENSION(:), POINTER :: rhs_b
INTEGER, DIMENSION(:), ALLOCATABLE :: ipenv
...
  rhs_b => rhs(nrn+1:nrn+nbn)         !create pointer to array section
  CALL el1slv(nbn,ipenv,envlra,rhs_b) !pass pointer as actual argument
...
SUBROUTINE el1slv(neqn,ipenv,env,rhs)
INTEGER, INTENT(IN) ::  neqn
  INTEGER, DIMENSION(:), INTENT(IN) :: ipenv
  REAL(KIND=kdp), DIMENSION(:), INTENT(IN) :: env
  REAL(KIND=kdp), DIMENSION(:), INTENT(INOUT) :: rhs

The authors could have used the array section itself as the actual argument instead of creating a pointer to the array and then passing the pointer as the actual argument:

  CALL el1slv(nbn,ipenv,envlra,rhs(nrn+1:nrn+nbn))

With the Intel, Gfortran, Absoft and Lahey compilers, the run time of the program is nearly identical whether the actual argument is a pointer or the array section itself. With FTN95, however, the pointer version takes 5 to 50 times longer, for the four test cases that come with HST3D. The longest running example, Hydrocoin, takes 28 seconds for the array section version, but 828 seconds for the pointer version. Both versions take about 40 seconds with Gfortran for the Hydrocoin problem.

I have been familiar with this issue for eight years, but held back from writing up a report since the program has over 20,000 lines of code. Recently, I created a modified version of HST3D in which (a) I fixed a number of minor bugs related to INTENT and SAVE, and (b) added array-section-as-actual-argument versions. At compile time, the user may define the preprocessor symbol USEPOINTER to select the old pointer version, or leave it undefined to use the new code.

The source code, test input data and instructions to build and run are provided in a downloadable Zip file, https://www.dropbox.com/scl/fi/hgurzyl6zoo854xyr9xsy/hst3d.zip?rlkey=i0hit0gpdtnvpea5wd8ibjg9a&dl=0 .

I should appreciate your taking the trouble to run the test code, observing the large difference in run times for pointer vs. array-section arguments, and assessing the causes for the discrepancy.

Note: HST3D in its original version contains two sparse linear equation solvers: (a) Skyline/Profile Direct Sparse Solver, and (b) Generalized Conjugate Gradient Minimal Residual Method Iterative Solver. I have removed the second solver in my modified version. I have also removed the Huyakorn test problem, which uses the GCGMRES solver and is present in the USGS distribution.

Thank you.

9 Feb 2024 7:29 #31077

mecej4

Many thanks for the feedback. I have logged this for investigation.

9 Feb 2024 2:13 #31082

Paul,

Encouraged by your agreeing to look into this issue, I continued my attempts to produce a shorter reproducer, and this time I succeeded.

I diverted the arrays in HST3D to an unformatted file at a point where I felt that the performance bug might surface, and created a reproducer that started by reading that file rather than creating the arrays in question by going through a long calculation as HST3D does.

The reproducer is less than 200 lines long, in a single source file. The unformatted file is 'efact.bin' and I have provided a batch file to build two versions of the program, one with array pointers and the other with array sections as arguments. The three files are contained in the following zip file

https://www.dropbox.com/scl/fi/ivxm461g4j8vlbzfgdbou/ArraySec.zip?rlkey=chaxi7d7432f17egfecl6la4k&dl=0

On my PC, the array section version runs in 0.3 second, and the pointer version takes 22.4 seconds.

It may be more convenient for you to use this new reproducer until a fix is found, and then try the fixed compiler/DLLs on the larger HST3D test program.

9 Feb 2024 4:14 #31083

mecej4

Thank you. That is very helpful.

15 Feb 2024 3:38 #31117

mecej4

An initial investigation indicates that the calls

      call el1slv(iband,ipenvv,envl,envutv)
      call elslv(iband,ipenvv,envut,diagv,envlv)

involve time consuming 'copy in' and (probably) 'copy out' operations via temporary arrays. The related array sections are presumably contiguous, in which case the copying is not needed. Unfortunately it is not possible to simply suppress this copying because it leads to invalid runtime code. Further investigation will be needed.

15 Feb 2024 4:20 #31118

Thanks, Paul.

Indeed, I tried adding the CONTIGUOUS attribute to the dummy argument declarations to which these array sections are being passed, and the program ran fine when compiled with other compilers that support CONTIGUOUS (Intel, NAG, Gfortran do, but not Lahey). If FTN95 can be helped to avoid copy-in/copy-out by being told that the argument in question is contiguous, that would suffice until the compiler becomes more sophisticated in deciding on its own whether copy-in/copy-out is needed or not.

Whether a solution is possible may depend on the compiler's array descriptor (dope vector) details. A fallback remedy is to pass the first array element as the actual argument, but that remedy has the drawback that /check will not permit such usage (unless there is a magical /inhibit_check number that will allow it). Is there such a check number corresponding to scalar actual/array dummy mismatch being tolerated, with other checks in place as usual?

16 Feb 2024 9:35 #31120

mecej4

Sorry there is no magical check number that I know of. Adding some support for the CONTIGUOUS attribute looks like a useful idea.

16 Feb 2024 3:43 #31122

Thanks, Paul; there are no surprises in what you found.

Implementing CONTIGUOUS could be done in two stages. In the first stage, the compiler would recognise CONTIGUOUS as a standard-approved attribute, but take no further action.

For users, this would be helpful by allowing them to use the same source code with FTN95 as with other compilers that recognise CONTIGUOUS (and, perhaps, those compilers may output faster machine code as a consequence).

As a later stage, FTN95 may be enhanced to generate faster machine code for passing contiguous actual arguments without copy-in/copy-out or temporary arrays. Users may see speed improvement without any source code changes being needed.

19 Feb 2024 7:28 #31130

Support for the CONTIGUOUS attribute has now been added for the next release of FTN95. It will have the effect of supressing some 'copy in' and 'copy out' processes for array sections when the compiler is not provided with enough information to be able to do this automatically.

On its own this will not fix the primary issue on this thread which remains outstanding.

21 Feb 2024 2:45 #31134

A provisional fix has now been implemented for this issue. Provisional in the sense that it works for the cut-down code but I have not tested it against the orginal project. This fix requires /opt and works for both Win32 and x64. It will be in the next release of FTN95.

22 Feb 2024 12:03 #31135

Great news! I look forward to the release of the compiler with the fixes.

Please login to reply.