forums.silverfrost.com

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

I have a program that uses allocatable arrays for a gaussean linear equation solver.
Running with FTN95 Ver 6.1 or 6.3, it goes to sleep !!
I've sent an email with more details and the sample code.
Could you please review the email and let me know if you can review the generated assembler code in the inner loop.

John

PaulLaidler · Posted: Mon Feb 04, 2013 3:36 pm Post subject:

Have you sent the email to Silverfrost?

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,
Yes I sent the email to silverfrost. It contained the code as a stand-alone program.
The main routine that is failing is

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,
Have you received the email?
If you compile and start the run, you should at least see that it is so much slower than other compilers.
The FTN95 trace I get from the latest version I emailed to you starts like: (I have never reached the end !) It uses 50 steps over 13,000 equations.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

I have posted further results. They identify the problem is with the loop:
do k = b,iband
j = row_i(k)+ii
sk(j,i) = sk(j,i) - c*row_f(k)
end do
My empirical analysis:
If j effectively increments j = j+1 as k = k+1, everything works well, as there is foward-calculation in the CPU.
If J does not step uniformly, but varied, then the foward-calculation in the CPU must be reset, destroying the efficiency. This reset takes a long time!!
FTN95 needs to determine a better way to reset when this pre-calculation is failing. Other compilers do not exhibit this problem. I have other calculations (back-substitution in linear equation solution) which exhibit this problem, but not as dramatically.
There is much more detail in the email. I hope you can receive it.

John

PaulLaidler · Posted: Wed Feb 06, 2013 8:38 am Post subject:

John

I am puzzled by this. You seem to imply that the FTN95 optimisation has changed with some recent versions and I can think of no good reason for this.

If you have access to two versions of FTN95 (one good, the other bad) then I suggest that you look at the /EXPLIST assembly for each and/or put in some timing to localise the difference.

Realistically, it is very unlikely that I will have the time to identify the problem let alone provide a fix to the compiler.

However, I will track down your email and keep it to hand.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

This is not a recent change in FTN95, the problem has always been there.
What has changed is I now have a small test example which demonstrates the problem.
The demonstration of the problem is by comparing the results between different compilers.
I have also been able to identify how the operation of the inner loop changes for the problem to occur or not, when using FTN95, with the system_clock ticks going from 60 to 18,000 per itteration of the inner 2 loops.

John

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

I understand you have a lot of work on at the moment, with the 64 bit version of clearwin and and windows 8 compatibility.

The problem I am reporting has been around for years. What is different now is that the identification of the problem is more clearly demonstrated.
Having a do loop changing from 60 system_clock cycles to 18,000 cycles is fairly dramatic. Something is causing the process to wait for a long time. FTN95's recovery from an unexpected state is much different to the other compilers I have available. I'm not sure how this state is identified.

I might try to run the problem in sdbg and then provide more measures of performance for the condition when I know that it is failing.

John

jalih · Joined: 30 Jul 2012 Posts: 196

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

I don't think the problem is that simple.
I tried to write a smaller example, where I used values of ROW_I, when it failed, but did not reproduce the delays.
I think the processor is seeing that ROW_I(k+1) = ROW_I(k)+1 typically and so when this does not occur after a lot of calculations, things get upset.
The latest test is basically

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

I have been trying to produce a minimum change version of good and bad and finally succeeded.
I have tracked down the problem to the following routine

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

I have continued to investigate this problem. The run times are dependent on how many zeros are in the vectors row_b and row_f.
For n=1700, when the number of zeros is between 50% and 80% there is a dramatic slow down.
Run times for my test vary from 70 seconds, to 7,000 seconds, depending on % zero, ie the values in the row_f vector. The zeros are typically grouped in blocks and not randomly.
I can only guess that the problem is related to the way the CPU handles lots of zero operations. lots of delays ?
The problem is significantly mitigated (although not totally eliminated) if I use Davidb's AVX instruction:
call fast_asm_dsaxpy (sk(1,i), row_f(ib), k, c)

Any ideas ?

John

PaulLaidler · Posted: Tue Feb 26, 2013 8:54 am Post subject:

I don't have any ideas about this at the moment, other than that it is
difficult to see what the compiler can do about this.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Paul,

I am wondering if it could be a timing problem with the assembler instructions generated by FTN95, where the response time for a zero multiply plus resulting product (non) addition is different to what the instructions generated by FTN95 expect, thereby generating an extended delay because the expected response had already occurred before the following instructions were ready.
This could explain the delays.

There is certainly a problem, where between 20% and 50% of the vector contents are zero. < 20% or > 50% do not have a problem, but otherwise the delay is significant.

John