forums.silverfrost.com

JohnCampbell · Joined: 16 Feb 2006 Posts: 2621 Location: Sydney

You could try this alternative code, selecting apy4@.

I ran this with FTN95 Release x64 on my Ryzen with 64 GBytes of physical memory. The test used 59 GBytes and ran faster than Gfortran.

DanRRight · Posted: Thu Mar 14, 2024 2:30 pm Post subject:

All irrelevant to the subject

JohnCampbell · Joined: 16 Feb 2006 Posts: 2621 Location: Sydney

No Dan, it is relevant at all.

If you have a poor solution approach, the compiler can only go so far.

There is still some need for understanding preferred numerical approaches in large calculations.

With your large 3d mesh, perhaps you should consider sparse calculation techniques to eliminate unnecessary calculations, which even the best optimising compilers can't yet easily implement.

I think my example showed that you can adapt to remove limitations in the compiler and utilise what is available to improve performance. This applies to all compilers, especially Gfortran and FTN95.

DanRRight · Posted: Fri Mar 15, 2024 8:18 am Post subject:

Flag into your hands and show your skills on Polyhedron examples. They are waiting for you for 25 years. Ooops, you already tried... Also MPI, OpenMP and CUDA begging for you for 15-20 years. Also tried some... why not with this compiler ? Smile

Not interested in 3% proprietary "improvements" on a single core which do not go anywhere else.

PaulLaidler · Posted: Fri Mar 15, 2024 9:00 am Post subject:

Like the Gilbert and Sullivan policeman, the compiler developer's "lot is not a happy one". They must "do what it says on the tin" but try at the same time to make up for naive mistakes. All within the time and resources available.

I remember a case where a user was having problems inverting a very large matrix via determinants. Why was the compiler so slow!

The current thread reminds me that FTN95 could do more to compensate for the unnecessary use of array sections. It also reveals a stack limitation that has become out of date.

The feedback is useful and hopefully will make FTN95 even better in the future.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2621 Location: Sydney

Paul,

Could you provide some more information on "Vstack".

Is it a general replacement for the STACK, enabling much larger local or automatic arrays without the need to redefine the Stack size ?
(perhaps 2 stacks, with Vstack used for large local or automatic arrays would work very well, while subroutine argument references and local variables on a small near stack)

As it is a memory address, it could be an address offset greater than the physical memory installed ( say 128 GBytes ) If it needs a long address, it can be anywhere. This has no affect on available memory as it only takes physical memory when required.

Ifort can place some very large memory strides for stack and heap addresses, without any severe performance hits.

Admittedly an array section that is larger than half the physical memory or configured virtual memory will always crash the program, so we may need a better test that the array section is not contiguous before resorting to a temporary copy.
Unnecessary temporary copies of array sections is a major cause of FTN95's poor performance in the Polyhedron examples.
Please do not resort to the Ifort approach of supporting non-contiguous memory arrays, as it breaks the F77_wrapper approach.

PaulLaidler · Posted: Sun Mar 17, 2024 2:22 pm Post subject:

John

The current virtual stack might be replaced subject to the planned review of this issue. So it would probably be best to await the outcome before providing further information. I will make sure that this has a high priority.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2621 Location: Sydney

Paul,

The concept of a larger stack for automatic or large local arrays, plus for temporary arrays is very good.

Also the use of large virtual address strides provides flexibility for a very large Vstack and heap. You should review Gfortran and Ifort load maps to identify the strides that they provide, with no appreciable performance problem.

This could leave the conventional stack as small for managing subroutine argument lists and smaller local variables and so use short addresses.

In my programs, most arrays are on the heap, which use a long address but still provide good performance.

I look forward to the review and hope that this can lead to fewer stack overflow errors !

John

PaulLaidler · Posted: Tue Apr 09, 2024 3:27 pm Post subject:

Here is the outcome of the promised review of what has been called the 64 bit "virtual stack".

The next release of FTN95 and its associated DLLs will be amended and the following instructions will apply (i.e. these will be the new instructions).

The compiler generates temporary blocks of data, for example, when passing non-contiguous array sections and when functions return array valued results.

For 64 bit programs, by default this temporary data is allocated from a private heap that is different from the global heap which is used for ALLOCATE statements in the user's program.

The compiler uses the global heap rather than this private heap when /ALLOCATE is added to the FTN95 command line but code created using this option could run more slowly.

The default size of this private heap is 128GB. This is the reserved and not the committed size so reducing this value should have no impact on performance.
It should not be necessary to increase this value. If it is too small then runtimes will probably be unacceptable because of the amount of data being copied.
However, the default can be set by using /VSTACK <size> on the FTN95 command line (<size> is the required number of GBs). Alternatively the default can be changed by a call to HEAP_RESERVE...

SUBROUTINE HEAP_RESERVE(RESERVE)
INTEGER RESERVE
This routine must be called before calling other routines. It sets the reserve size of the private heap as the number of GBs required.
There is no known advantage in setting this value below its default value.