forums.silverfrost.com

johannes · Joined: 21 Jan 2011 Posts: 65 Location: Leimen, Germany

Hi all,
could anyone give a clue how accelerate this primitive loop by means of advanced f90 capabilities?

LitusSaxonicum · Posted: Fri Sep 26, 2014 10:22 am Post subject:

Just a guess, Johannes, but couldn't you do something of the kind with EQUIVALENCE? I suppose it depends how big the maximum for k is before that gets tedious. (it would be handy if k=1).

If n is small, you may get some benefit from loop unrolling, or if n is always a multiple of some other number by partial unrolling

Eddie

johannes · Joined: 21 Jan 2011 Posts: 65 Location: Leimen, Germany

Hi Eddie,
even back in times of f70 I never touched EQUIVALENCE. Shocked

Let my try later . IÄll come back
johannes

johannes · Joined: 21 Jan 2011 Posts: 65 Location: Leimen, Germany

b2 and b3 arrays are allocatable. No EQUIVALENCE allowed.

johannes

PaulLaidler · Posted: Fri Sep 26, 2014 5:05 pm Post subject:

LitusSaxonicum · Posted: Fri Sep 26, 2014 8:47 pm Post subject:

Brilliant Paul!

The three nested loops aren't part of the answer of course, but wouldn't it be better to have the i loop te innermost one?

Eddie

PaulLaidler · Posted: Sat Sep 27, 2014 7:17 am Post subject:

I don't know off hand. One could look at the /explist and also look to see if /opt makes any difference.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2629 Location: Sydney

Eddie,

The following change would certainly improve cache usage, especially when the array sizes increase.

LitusSaxonicum · Posted: Tue Sep 30, 2014 7:06 am Post subject:

Hi John,

I made it a question not an assertion so as not to offend anyone. This business of index order was something I first encountered in Kreitzberg and Schneiderman's book, c. early 1970s.

I had little idea that it was so costly in time, but I had forgotten about cache misses, and was thinking only about how big the arrays can be. Paul's example probably doesn't count for much delay.

Eddie

johannes · Joined: 21 Jan 2011 Posts: 65 Location: Leimen, Germany

Hi all,
accessing every array element costs time.

Isn't there any solution using pointers or so?
Like: ptr=>b3(1,1,k) ! pinting to the first element in the slice
and b2(1,1) = using the pointer ptr???

best regards
johannes

PaulLaidler · Posted: Mon Oct 06, 2014 7:01 pm Post subject:

The essence of the answer is in my code

johannes · Joined: 21 Jan 2011 Posts: 65 Location: Leimen, Germany

Hi Paul,
did you want to say, that b2(:, Smile

=b3(:,:,k) does not store b2 element by element, instead it is more or less like shifting some adress?
BR
johannes

PaulLaidler · Posted: Tue Oct 07, 2014 9:58 am Post subject:

No it does not move addresses. But it will significantly reduce the evaluation of the addresses. It may lead to a "block copy" rather than copying element by element. This will depend on whether or not the elements are contiguous in memory and how clever the optimiser is. (Note FTN95 does some optimisation even when /OPT is not switched on. Note also that Fortran uses "column major" ordering of array elements).

The usual approach is to wrap a timer (e.g. "call system_clock(clock_count)") around the relevant code and try some experiments.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2629 Location: Sydney

You would certainly have to be careful, where you have different size arrays, such as