 |
forums.silverfrost.com Welcome to the Silverfrost forums
|
| View previous topic :: View next topic |
| Author |
Message |
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Fri Jan 18, 2013 10:18 pm Post subject: Re: |
|
|
| LitusSaxonicum wrote: |
| Quote: |
| (It was different in Fortran 66) |
... the first Fortran-77 compiler I used was on a VAX
Eddie |
SNAP.
But I didn't learn Fortran 77 properly until I had it installed on an Acorn Achimedes and then a Acorn Cambridge Workstation (miss them days). _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
| Back to top |
|
 |
JohnCampbell
Joined: 16 Feb 2006 Posts: 2623 Location: Sydney
|
Posted: Sun Jan 20, 2013 1:48 pm Post subject: |
|
|
I've been away, so it is interesting to read what has been discussed about FORALL.
There are two sides to this:
One is that David is right in that there is an error with FTN95 not identifying the error.
The other is that I would disagree with David, when he states:
| Quote: |
It doesn't help thinking about FORALL as a loop.
...
Its not even close to being a loop. |
While the syntax is not a do loop, the compiler implements a do loop equivalent.
David's example where the loop should be run backwards identifies that:
To get the intended result, a DO loop should be run backwards.
For a FORALL, it must take a copy of the array, then act on this old array to produce the new array. I would consider that inefficient to do that, similar to the temporary copy for array sections in multi-dimension arrays.
My impression is that FORALL is not an efficient construct, contrary to it's initially being for identifying parallel computation.
I'm old-school and only ever use it in example coding.
David's identification of errors in FORALL implementation is also helpful for those who choose to use it.
John |
|
| Back to top |
|
 |
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Sun Jan 20, 2013 2:56 pm Post subject: Re: |
|
|
| JohnCampbell wrote: |
While the syntax is not a do loop, the compiler implements a do loop equivalent.
David's example where the loop should be run backwards identifies that:
To get the intended result, a DO loop should be run backwards.
For a FORALL, it must take a copy of the array, then act on this old array to produce the new array. I would consider that inefficient to do that, similar to the temporary copy for array sections in multi-dimension arrays.
|
It depends on the compiler and the hardware.
Certainly Silverfrost's FTN95 implements the FORALL by making a temporary copy when necessary, and in such cases it isn't very efficient.
Other compilers with better optimization will be able to convert the FORALL to a DO loop which doesn't need the copy operation (like in the example where the DO loop goes backwards).
It is fair to say that most compilers are not very good at implementing FORALL efficiently and in most cases, a good crafted DO loop will outperform FORALL. On the other hand, there are a small number of compilers (e.g. Intel, Portland) which do a good job optimizing FORALL and produce code which is on par with or better than the equivalent DO loop -- performance always depends on the underlying vectorisation hardware and whether the compiler can take benefit of it.
Now if Silverfrost decided to offer limited support in FTN95 for SSE and AVX instructions when FORALL is used, there may be a cause to switch, but as of now, you should stick to DO loops in your applications if efficiency is paramount. _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
| Back to top |
|
 |
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8283 Location: Salford, UK
|
Posted: Wed Mar 27, 2013 6:22 pm Post subject: |
|
|
I have fixed the bug so that FORALL can now use the same index as an outer DO construct.
This fix will be included in the next release (after 6.35). |
|
| Back to top |
|
 |
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Wed Mar 27, 2013 6:44 pm Post subject: |
|
|
Thank you Paul! _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
| Back to top |
|
 |
simon
Joined: 05 Jul 2006 Posts: 308
|
Posted: Mon Aug 26, 2013 6:08 pm Post subject: |
|
|
Paul - can you confirm whether FORALL is running any faster than it used to? As David, suggests, on most compilers it does not necessarily improve efficiency.
If I run the following program, the various FORALL constructs don't seem to perform very well. I have tried compiling using the following options and the first FORALL seems to perform consistently badly, and always at least as badly as having nested FORALL statements.
/lgo
/lgo /optimise
/lgo /check
Somewhat intriguingly (to me anyway), if a single FORALL statement is used, it seems to make an important difference which index is listed first, and some compilers seem to prefer one ordering whereas others prefer the opposite. So I've avoided using FORALL, simply because it actually seems to make the program run slower! I can't even run these types of tests to see under what conditions FORALL does work faster because it is quite likely to go slower under a different compiler.
| Code: |
PROGRAM t
!
! Compile this program using the following options, and compare times
! FTN95 t.f95 /lgo
! FTN95 t.f95 /lgo /optimise
! FTN95 t.f95 /lgo /check
!
INTEGER, PARAMETER :: m=9000,n=5000
REAL :: a(m,n)
!
CALL CPU_TIME (t1)
DO i=1,n
DO j=1,m
a(j,i)=0.0
END DO
END DO
CALL CPU_TIME (t2)
PRINT*, 'DO loops in correct order ',t2-t1
!
CALL CPU_TIME (t1)
DO j=1,m
DO i=1,n
a(j,i)=0.0
END DO
END DO
CALL CPU_TIME (t2)
PRINT*, 'DO loops in incorrect order ',t2-t1
!
CALL CPU_TIME (t1)
a(1:m,1:n)=0.0
CALL CPU_TIME (t2)
PRINT*, 'a(1:m,1:n) ',t2-t1
!
CALL CPU_TIME (t1)
a(:,:)=0.0
CALL CPU_TIME (t2)
PRINT*, 'a(:,:) ',t2-t1
!
CALL CPU_TIME (t1)
a=0.0
CALL CPU_TIME (t2)
PRINT*, 'a ',t2-t1
!
CALL CPU_TIME (t1)
FORALL (j=1:m,i=1:n)
a(j,i)=0.0
END FORALL
CALL CPU_TIME (t2)
PRINT*, 'One FORALL statement ',t2-t1
!
CALL CPU_TIME (t1)
FORALL (i=1:n,j=1:m)
a(j,i)=0.0
END FORALL
CALL CPU_TIME (t2)
PRINT*, 'One FORALL statement, reversed ',t2-t1
!
CALL CPU_TIME (t1)
FORALL (j=1:m)
FORALL (i=1:n)
a(j,i)=0.0
END FORALL
END FORALL
CALL CPU_TIME (t2)
PRINT*, 'FORALL loops in incorrect order ',t2-t1
!
CALL CPU_TIME (t1)
FORALL (i=1:n)
FORALL (j=1:m)
a(j,i)=0.0
END FORALL
END FORALL
CALL CPU_TIME (t2)
PRINT*, 'FORALL loops in correct order ',t2-t1
END PROGRAM t |
|
|
| Back to top |
|
 |
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8283 Location: Salford, UK
|
Posted: Tue Aug 27, 2013 6:53 am Post subject: |
|
|
No work has been done on FTN95 to make FORALL more efficient.
If you want to see what FTN95 does with a given FORALL statement then you can look at the output given using /EXPLIST. You don't need to know assembler coding to get an understanding of the ordering.
As a general rule you will get a faster run time when using /opt whilst /check will be slow and is intended for development and testing only. |
|
| Back to top |
|
 |
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2428 Location: Yateley, Hants, UK
|
Posted: Tue Aug 27, 2013 11:42 am Post subject: |
|
|
Re:
| Quote: |
| the various FORALL constructs don't seem to perform very well |
I ran it on a quad core Phenom II system with Windows 7 64 bit, with the following results:
| Code: |
Raw timings Ratio to correct order DO
Base Opt Base Opt
DO loops in correct order 0.20280 0.07800 1.00000 1.00000
DO loops in incorrect order 1.54441 1.52881 7.61540 19.60000
a(1:m,1:n) 0.15600 0.04680 0.76923 0.60000
a(:,:) 0.04680 0.03120 0.23077 0.40000
a 0.03120 0.04680 0.15385 0.60000
One FORALL statement 1.54441 1.49761 7.61540 19.20001
One FORALL statement reversed 0.15600 0.04680 0.76923 0.60000
FORALL loops in incorrect order 1.54441 1.51321 7.61540 19.40000
FORALL loops in correct order 0.17160 0.04680 0.84615 0.60000 |
Using OPT or not, the correct order DO loops are bettered by 2 of the FORALL constructs, and also by various array operations., although without OPT the overall winner is different, and the relative timings are more spread out. The relative timings improved with OPT for a(:,:) but worsened for just plain a - which is probably worth Paul taking a look at.
I think the most striking point is that there is a certain amount you can do to make your code run faster, but if you get it wrong, you can slow it down dramatically!
Eddie |
|
| Back to top |
|
 |
simon
Joined: 05 Jul 2006 Posts: 308
|
Posted: Tue Aug 27, 2013 2:50 pm Post subject: |
|
|
Based on Eddie's helpful summary of timings, it seems to me that FTN95 works reasonably well under certain circumstances. However, it is worth noting that whereas
| Code: |
| FORALL (i=1:n,j=1:m) |
works more slowly than
| Code: |
| FORALL (j=1:m,i=1:n) |
with FTN95, I have seen the opposite on other compilers.
I have also compared the following syntaxes:
| Code: |
| FORALL (i=1:n) a(j,i)=0.0 |
and
| Code: |
FORALL (i=1:n)
a(j,i)=0.0
END FORALL |
In some cases the former works faster than the latter, but in others the opposite is the case. However, the differences in timing are typically small.
Based on these results (and a few others not shown), I rather hesitantly conclude (for now) that it makes sense to implement FORALL, but with the following qualifications:
1. Use array operations of the form a= or a(:,:)= where possible;
2. It is worth implementing FORALL statements in place of DO where there is only one loop;
3. Where there are multiple loops, FORALL statements should be nested explicitly in the appropriate order, i.e.:
| Code: |
FORALL (i=1:n)
FORALL (j=1:m)
a(j,i)=0.0
END FORALL
END FORALL |
in preference to
| Code: |
FORALL (j=1:m,i=1:n)
a(j,i)=0.0
END FORALL |
and in preference to
| Code: |
FORALL (i=1:n, j=1:m)
a(j,i)=0.0
END FORALL |
|
|
| Back to top |
|
 |
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2428 Location: Yateley, Hants, UK
|
Posted: Tue Aug 27, 2013 5:27 pm Post subject: |
|
|
Simon,
I'll go further. OPT sometimes seems to make me trip over, and as an old dog, FORALL is too new a trick. It needs nibbling at!
However, as a non-OPT user, I can just about get round to using the whole array name = constant instead of nested DO loops if I want to zero all of it, but if say, a is dimensioned 100,100, and one only wants to zero 10x10, the nested DOs are much quicker. Nested DOs seem to me to be more straightforward than FORALL if one wants to use the loop variable itself inside the loop, and the provision for that is probably what slows down the single DO.
Getting the right subscript order was recommended in Kreitzberg & Schneiderman's "the elements of FORTRAN style" in 1972 - plus ca change.
Eddie
(As evidence of the old-doggedness, I can't make head nor tail of even installing some other compilers, let alone using them!). |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|