Topic: Bringing a slice of a rank 3 into a rank 2 array in General

johannes

Posts: 65 Leimen, Germany

Back to Top

26 Sep 2014 8:30 #14709

Hi all, could anyone give a clue how accelerate this primitive loop by means of advanced f90 capabilities?

       k=...
       FORALL (i=1:n,j=1:n) 
          b2(i,j)=b3(i,j,k) 
       end forall

Is it possible to use RESHAPE to map b3 to b2? BR johannes

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

26 Sep 2014 9:22 #14710

Just a guess, Johannes, but couldn't you do something of the kind with EQUIVALENCE? I suppose it depends how big the maximum for k is before that gets tedious. (it would be handy if k=1).

If n is small, you may get some benefit from loop unrolling, or if n is always a multiple of some other number by partial unrolling

Eddie

johannes

Posts: 65 Leimen, Germany

Back to Top

26 Sep 2014 9:34 #14711

Hi Eddie, even back in times of f70 I never touched EQUIVALENCE. 😒hock: Let my try later . IÄll come back johannes

johannes

Posts: 65 Leimen, Germany

Back to Top

26 Sep 2014 1:00 #14716

b2 and b3 arrays are allocatable. No EQUIVALENCE allowed.

johannes

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

26 Sep 2014 4:05 #14722

program main
real b2(4,5), b3(4,5,3)
do i = 1,4
  do j = 1,5
    do k = 1,3
      b3(i,j,k)= 100*i+10*j+k
    end do
  end do
end do      
k = 2
b2(:,:)=b3(:,:,k)
print*, b2(1,:)
print*, b2(2,:) 
print*, b2(3,:) 
print*, b2(4,:) 
end

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

26 Sep 2014 7:47 #14725

Brilliant Paul!

The three nested loops aren't part of the answer of course, but wouldn't it be better to have the i loop te innermost one?

Eddie

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

27 Sep 2014 6:17 #14727

I don't know off hand. One could look at the /explist and also look to see if /opt makes any difference.

JohnCampbell

Posts: 2526 Sydney

Back to Top

30 Sep 2014 12:44 #14750

Eddie,

The following change would certainly improve cache usage, especially when the array sizes increase.

do k = 1,3 
  do j = 1,5 
    do i = 1,4 
      b3(i,j,k)= 100*i+10*j+k 
    end do 
  end do 
end do

This becomes much more significant in the case of larger arrays, such as the equation solver SYMSOL where when the stiffness matrix is stored as ST(equations,band) will take much longer to run, compared to using the array ST(band,equations). The run time difference can be a factor of x10 to x100, which swamps any other attempt at coding efficiency. It occurs when the matrix is much larger than the processor cache size.

John

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

30 Sep 2014 6:06 #14751

Hi John,

I made it a question not an assertion so as not to offend anyone. This business of index order was something I first encountered in Kreitzberg and Schneiderman's book, c. early 1970s.

I had little idea that it was [u:e8175fb63e]so[/u:e8175fb63e] costly in time, but I had forgotten about cache misses, and was thinking only about how big the arrays can be. Paul's example probably doesn't count for much delay.

Eddie

johannes

Posts: 65 Leimen, Germany

Back to Top

6 Oct 2014 2:22 #14778

Hi all, accessing every array element costs time.

Isn't there any solution using pointers or so? Like: ptr=>b3(1,1,k) ! pinting to the first element in the slice and b2(1,1) = using the pointer ptr???

best regards johannes

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

6 Oct 2014 6:01 #14780

The essence of the answer is in my code

b2(:,:)=b3(:,:,k)

The rest is just for illustration.

johannes

Posts: 65 Leimen, Germany

Back to Top

7 Oct 2014 7:21 #14783

Hi Paul, did you want to say, that b2(:,:)=b3(:,:,k) does not store b2 element by element, instead it is more or less like shifting some adress? BR johannes

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

7 Oct 2014 8:58 #14784

No it does not move addresses. But it will significantly reduce the evaluation of the addresses. It may lead to a 'block copy' rather than copying element by element. This will depend on whether or not the elements are contiguous in memory and how clever the optimiser is. (Note FTN95 does some optimisation even when /OPT is not switched on. Note also that Fortran uses 'column major' ordering of array elements).

The usual approach is to wrap a timer (e.g. 'call system_clock(clock_count)') around the relevant code and try some experiments.

JohnCampbell

Posts: 2526 Sydney

Back to Top

8 Oct 2014 10:16 #14788

You would certainly have to be careful, where you have different size arrays, such as integer, parameter :: n=5 integer b2(n,n) integer b3(10,10,3) k=... FORALL (i=1:n,j=1:n) b2(i,j)=b3(i,j,k) end forall

The forall would work here, but the use of (:,:) requires the same size for the first 2 dimensions. If you are looking for a faster approach, move@ might help, although (:,:) should work well. If you need array sections, or the arrays are not the same size then you should consider an alternative, such as using move@ in a loop, such as: do j = 1,n call move@ ( b3(1,j,k), b2(1,j), n*4 ) end do

Another alternative could be the following without any /check option do I = 1,n*n b2(I,1) = b3(I,1,k) end do

or call move@ ( b3(1,1,k), b2, nn4 ) John