|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
mecej4
Joined: 31 Oct 2006 Posts: 1897
|
Posted: Fri Feb 09, 2024 4:32 am Post subject: Severe slowdown if pointers passed as actual arguments |
|
|
The software package HST3D developed by the USGS is used to model groundwater flows with coupled heat and salt transport. It has been widely used for decades, and is standard Fortran 95 with no dependence on external libraries. See https://wwwbrr.cr.usgs.gov/projects/GW_Solute/hst/index.shtml .
The HST3D source code contains several instances where a pointer to an array section is passed as an actual argument to a subprogram where the corresponding dummy argument is an assumed shape array. Such subprograms have explicit interfaces available to their callers, as required.
Here is an example of such usage.
Code: | INTEGER, PARAMETER :: kdp = SELECTED_REAL_KIND(14,60)
REAL(KIND=kdp), DIMENSION(:), ALLOCATABLE, TARGET :: rhs
REAL(KIND=kdp), DIMENSION(:), ALLOCATABLE :: envlra
REAL(KIND=kdp), DIMENSION(:), POINTER :: rhs_b
INTEGER, DIMENSION(:), ALLOCATABLE :: ipenv
...
rhs_b => rhs(nrn+1:nrn+nbn) !create pointer to array section
CALL el1slv(nbn,ipenv,envlra,rhs_b) !pass pointer as actual argument
...
SUBROUTINE el1slv(neqn,ipenv,env,rhs)
INTEGER, INTENT(IN) :: neqn
INTEGER, DIMENSION(:), INTENT(IN) :: ipenv
REAL(KIND=kdp), DIMENSION(:), INTENT(IN) :: env
REAL(KIND=kdp), DIMENSION(:), INTENT(INOUT) :: rhs
|
The authors could have used the array section itself as the actual argument instead of creating a pointer to the array and then passing the pointer as the actual argument:
Code: | CALL el1slv(nbn,ipenv,envlra,rhs(nrn+1:nrn+nbn)) |
With the Intel, Gfortran, Absoft and Lahey compilers, the run time of the program is nearly identical whether the actual argument is a pointer or the array section itself. With FTN95, however, the pointer version takes 5 to 50 times longer, for the four test cases that come with HST3D. The longest running example, Hydrocoin, takes 28 seconds for the array section version, but 828 seconds for the pointer version. Both versions take about 40 seconds with Gfortran for the Hydrocoin problem.
I have been familiar with this issue for eight years, but held back from writing up a report since the program has over 20,000 lines of code. Recently, I created a modified version of HST3D in which (a) I fixed a number of minor bugs related to INTENT and SAVE, and (b) added array-section-as-actual-argument versions. At compile time, the user may define the preprocessor symbol USEPOINTER to select the old pointer version, or leave it undefined to use the new code.
The source code, test input data and instructions to build and run are provided in a downloadable Zip file, https://www.dropbox.com/scl/fi/hgurzyl6zoo854xyr9xsy/hst3d.zip?rlkey=i0hit0gpdtnvpea5wd8ibjg9a&dl=0 .
I should appreciate your taking the trouble to run the test code, observing the large difference in run times for pointer vs. array-section arguments, and assessing the causes for the discrepancy.
Note: HST3D in its original version contains two sparse linear equation solvers: (a) Skyline/Profile Direct Sparse Solver, and (b) Generalized Conjugate Gradient Minimal Residual Method Iterative Solver. I have removed the second solver in my modified version. I have also removed the Huyakorn test problem, which uses the GCGMRES solver and is present in the USGS distribution.
Thank you. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8037 Location: Salford, UK
|
Posted: Fri Feb 09, 2024 8:29 am Post subject: |
|
|
mecej4
Many thanks for the feedback. I have logged this for investigation. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1897
|
Posted: Fri Feb 09, 2024 3:13 pm Post subject: |
|
|
Paul,
Encouraged by your agreeing to look into this issue, I continued my attempts to produce a shorter reproducer, and this time I succeeded.
I diverted the arrays in HST3D to an unformatted file at a point where I felt that the performance bug might surface, and created a reproducer that started by reading that file rather than creating the arrays in question by going through a long calculation as HST3D does.
The reproducer is less than 200 lines long, in a single source file. The unformatted file is "efact.bin" and I have provided a batch file to build two versions of the program, one with array pointers and the other with array sections as arguments. The three files are contained in the following zip file
https://www.dropbox.com/scl/fi/ivxm461g4j8vlbzfgdbou/ArraySec.zip?rlkey=chaxi7d7432f17egfecl6la4k&dl=0
On my PC, the array section version runs in 0.3 second, and the pointer version takes 22.4 seconds.
It may be more convenient for you to use this new reproducer until a fix is found, and then try the fixed compiler/DLLs on the larger HST3D test program. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8037 Location: Salford, UK
|
Posted: Fri Feb 09, 2024 5:14 pm Post subject: |
|
|
mecej4
Thank you. That is very helpful. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8037 Location: Salford, UK
|
Posted: Thu Feb 15, 2024 4:38 pm Post subject: |
|
|
mecej4
An initial investigation indicates that the calls
Code: | call el1slv(iband,ipenvv,envl,envutv)
call elslv(iband,ipenvv,envut,diagv,envlv)
|
involve time consuming "copy in" and (probably) "copy out" operations via temporary arrays. The related array sections are presumably contiguous, in which case the copying is not needed. Unfortunately it is not possible to simply suppress this copying because it leads to invalid runtime code. Further investigation will be needed. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1897
|
Posted: Thu Feb 15, 2024 5:20 pm Post subject: |
|
|
Thanks, Paul.
Indeed, I tried adding the CONTIGUOUS attribute to the dummy argument declarations to which these array sections are being passed, and the program ran fine when compiled with other compilers that support CONTIGUOUS (Intel, NAG, Gfortran do, but not Lahey). If FTN95 can be helped to avoid copy-in/copy-out by being told that the argument in question is contiguous, that would suffice until the compiler becomes more sophisticated in deciding on its own whether copy-in/copy-out is needed or not.
Whether a solution is possible may depend on the compiler's array descriptor (dope vector) details. A fallback remedy is to pass the first array element as the actual argument, but that remedy has the drawback that /check will not permit such usage (unless there is a magical /inhibit_check number that will allow it). Is there such a check number corresponding to scalar actual/array dummy mismatch being tolerated, with other checks in place as usual? |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8037 Location: Salford, UK
|
Posted: Fri Feb 16, 2024 10:35 am Post subject: |
|
|
mecej4
Sorry there is no magical check number that I know of. Adding some support for the CONTIGUOUS attribute looks like a useful idea. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1897
|
Posted: Fri Feb 16, 2024 4:43 pm Post subject: |
|
|
Thanks, Paul; there are no surprises in what you found.
Implementing CONTIGUOUS could be done in two stages. In the first stage, the compiler would recognise CONTIGUOUS as a standard-approved attribute, but take no further action.
For users, this would be helpful by allowing them to use the same source code with FTN95 as with other compilers that recognise CONTIGUOUS (and, perhaps, those compilers may output faster machine code as a consequence).
As a later stage, FTN95 may be enhanced to generate faster machine code for passing contiguous actual arguments without copy-in/copy-out or temporary arrays. Users may see speed improvement without any source code changes being needed. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8037 Location: Salford, UK
|
Posted: Mon Feb 19, 2024 8:28 am Post subject: |
|
|
Support for the CONTIGUOUS attribute has now been added for the next release of FTN95. It will have the effect of supressing some "copy in" and "copy out" processes for array sections when the compiler is not provided with enough information to be able to do this automatically.
On its own this will not fix the primary issue on this thread which remains outstanding. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8037 Location: Salford, UK
|
Posted: Wed Feb 21, 2024 3:45 pm Post subject: |
|
|
A provisional fix has now been implemented for this issue. Provisional in the sense that it works for the cut-down code but I have not tested it against the orginal project. This fix requires /opt and works for both Win32 and x64. It will be in the next release of FTN95. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1897
|
Posted: Thu Feb 22, 2024 1:03 am Post subject: |
|
|
Great news! I look forward to the release of the compiler with the fixes. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|