forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Parallelization with FTN95
Goto page 1, 2  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Thu Mar 13, 2008 7:06 pm    Post subject: Parallelization with FTN95 Reply with quote

I just got 4-core Q6600 and found that my parallel libraries made by Equation dot com do not give me speedup I had with previous Intel processors. In fact, I get maximum speedup less than factor 2, just 1.3-1.4 with 2 cores deployed, and then with 3 and 4 cores speedup even starts to decrease!

Can anyone who has Intel/AMD dual processors or latest AMD native quad processor test the same simple fortran code and check if you will have proportional speedup on systems of linear equations?

Send me your email.


Last edited by DanRRight on Mon May 19, 2008 4:53 pm; edited 1 time in total
Back to top
View user's profile Send private message
Robert



Joined: 29 Nov 2006
Posts: 445
Location: Manchester

PostPosted: Fri Mar 14, 2008 11:48 am    Post subject: Reply with quote

Out of interest, what sort of speed increases did you see on other systems?
Back to top
View user's profile Send private message Visit poster's website
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Fri Mar 14, 2008 10:17 pm    Post subject: Reply with quote

Other cases could be sometimes pretty good even with Q6600.
For example

Number of equations: 2000000
Half bandwidth: 8

Processor: 1
Elapsed Time (Seconds): 7.14
Processors: 2
Elapsed Time (Seconds): 3.53
Processors: 3
Elapsed Time (Seconds): 2.47
Processors: 4
Elapsed Time (Seconds): 2.00

Also fun to play with parallelization itself.
It definitely has a future
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Mar 15, 2008 11:05 am    Post subject: Reply with quote

Have you obtained these improvements using Salford FTN95 ?
I would be very interested in seeing this work with ftn95.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Mon Mar 17, 2008 6:26 pm    Post subject: Reply with quote

All the improvements are in the libraries linked to FTN95.
You just call subroutine and link it with any compiler.

There also exist library MTASK (which I do not see the author of equation dot com advertises by some reason) which represents a simple parallel language where you can arrange the code deviding it for pieces (for example devide DO loop on N independent areas for each of N processor and at the end all of them will finish N times faster if devided correctly. Processors which finish earlier will wait others finished). All is done same way like Clearwin, Winteracter, any graphics, or any external to Fortran libraries are working - just call subroutines/functions in the fortran text linking them with slink. Nice playing toy.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sat Mar 29, 2008 12:34 am    Post subject: Reply with quote

Interesting looks comparison between AMD and INTEL processors
Above you have seen benchmark of 2.4GHz Intel quad core
processor Q6600.

Here is the score for almost same clock 2.31GHz AMD Phenom 9600
(thanks to John Horspool)

Number of equations: 2000000
Half bandwidth: 8

Processor: 1
Elapsed Time (Seconds): 3.05
Processors: 2
Elapsed Time (Seconds): 1.52
Processors: 3
Elapsed Time (Seconds): 1.06
Processors: 4
Elapsed Time (Seconds): 0.84

Surprize-surprize! AMD is 2.5 (!!!) times faster!
We have to overclock INTEL to 5.5-6.0 GHz
to get this result, not really easy to achieve.
In fact I only succeeded to overclock
new 45nm processors like E8400 to 4.5GHz
on air where it is not as superstable
(4.2 GHz is OK), and besides this is just the
dual-core processor

This confirms that SPEC also shows
higher scores for AMD (by 60% or so).
AMD loses only in games (by 10-15%, who cares?)
due to slow integer and couple floating point
multimedia extensions subroutines but has
no cache coherency problems like in the tests
I showed in my first post above for INTEL.

By the way, dual core INTEL processors
despite being slower then AMD nevertheless
scale perfectly with amount of cores, because
cores are on the same die and there is no
slow down via bus transfers and cache incoherency.
So hopefully INTEL Nehalem processors will
be better with parallelization (I mean they
will scale better for broader variety of tasks)

Good news also is that the company promiss
to make parallel libraries for Salford compilers,
right now I use ones built for other compilers and
compile it with /IMPORT_LIB switch and then SLINK
as usually
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Mon May 19, 2008 4:49 pm    Post subject: Reply with quote

I think to make Salford FTN95 as parallel language 99.9% is already done, let me know if I'm wrong.

First multithreading is already done with winio@. Now to do simple parallel functions it is necessary just to implement thread safe output like print* into separate screen units like it is done right now when you define OUTunit1, OUTunit2

i = winio@('%pv%120.10cw[hscroll,vscroll]&', OUTunit1)
i = winio@('%pv%120.10cw[hscroll,vscroll]&', OUTunit2)

and do a little coding for two-three more winio functions. Suppose you need to parallelize do loop in dual-core CPU.

You arrange the loop
do i=1,N
.............
enddo

into two functions loop1 with
do i=1,N/2
................
enddo

and loop2
i=N/2+1, N
................
enddo

and would call (I take %xx names arbitrary just for demonstration)

i=winio@('%np&',n_processors) ! find amount of processors
i=winio@('%em&',2) !employ just two of them if you have more than 2
i=winio@('%lt&',1,loop1) ! launch first task on first processor
i=winio@('%lt&',2,loop2) ! launch second task in second processor
i=winio@('%we') ! wait end for both tasks execution

<do your other job here>

Both threads will print on screen in separate text windows OUTunit1, OUTunit2. That's all we need. it is exactly how basically simple Mtask language of www.equation.com works.
Very simple and effective
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7925
Location: Salford, UK

PostPosted: Tue May 20, 2008 8:03 am    Post subject: Reply with quote

winio@ processes the Windows message queue on a single thread. There is no built-in multi-threading under Win32.

.NET does have its own multi-threading but under Win32 you will need to access the Windows API threading functions directly.

I some respects, with a single processor, multi-threading is not much different from multi-processing because different processes take over the CPU for intervals of time. However, the threading functions provide ways of syncronising the various threads and sharing and locking the common data.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sun May 25, 2008 5:02 pm    Post subject: Reply with quote

I ran your Threads.f95 example under win32 and will tell you that I'd like to get even such kind of parallelism if you would implement it in the form as above (winio@ or similar clear and simple language). Of course the freedom to employ specific amount of processors instead of all of them would be better. The ultimate wish would be ability to employ specific cores on the CPU for specific threads. This is for the such tasks and threads which require access to the same cache for coherency and ultimate speed (like with linear algebra).

"Salford Fortran. Build-in multithreaded parallelism for modern multicore processors"

or something similar. Would sound good to me Smile
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sat Jun 21, 2008 7:02 pm    Post subject: Reply with quote

Three months ago I wrote to the mentioned above company and convinced their programmers to take a look at Salford FTN95 to make the native parallel library for this compiler the same way as they make libraries for GFortran, Lahey, Intel, Absoft, you name it. They worked all this time. And guess what? They failed! Seems unlike all other compilers (most of which are faster then Salford, sometimes just a bit, sometimes very substantially), the Salford FTN95 reorders statements when its optimization is switched on. As you understand, this is killing for parallelization. Without optimization Salford is 3 times slower. Slower then ten years ago discontinued Microsoft fortran. Fortran is not just logics, simplicity, reliability, rich libraries, development speed, Fortran is also ultimate execution speed.

Now what left is fast new Intel fortran library (faster then my old Microsoft Fortran one which is compatible with Salford by factor 1.5). It works sometimes though has complaining about some missing symbols (__intel_f2int, _fltused ...) but mostly fail. Let's look at the problem from another point of view. If changing the compiler is not a viable option or difficult task by some reason, is there any way to make it working with Salford using any compatible wrappers, dll etc?
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7925
Location: Salford, UK

PostPosted: Sat Jun 21, 2008 8:12 pm    Post subject: Reply with quote

You are obviously not a fan of FTN95 but I wonder why you are making unsubstantiated statements like "the Salford FTN95 reorders statements when its optimization is switched on". If you can produce any evidence that FTN95 does this (when it is not appropriate) then please let us have the details so that we can fix the problem.

This forum is provided and maintained by Silverfrost for the benefit of FTN95 users. You are welcome not to use FTN95 if it does not serve your purposes but it would be better if you left your critical remarks in another place.
Back to top
View user's profile Send private message AIM Address
JohnHorspool



Joined: 26 Sep 2005
Posts: 270
Location: Gloucestershire UK

PostPosted: Sun Jun 22, 2008 12:31 am    Post subject: Reply with quote

Working on a 64bit XP OS with a pure number crunching source code (no graphics and no clearwin) I found a default compile with 32bit FTN95 produced an exe that ran substantially faster than one produced using a 64bit version of the gfortran compiler !
Back to top
View user's profile Send private message Visit poster's website
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sun Jun 22, 2008 2:59 am    Post subject: Reply with quote

http://www.polyhedron.com/benchamdwin

Of course your mileage may vary. By tuning, or using external libraries (like in this case with parallel algebra libraries - parallelization is our unavoidable future) you can get the best out of best.

Here is strength of Salford: developer's debugging time, compile time, cleaner codes

http://www.polyhedron.com/pb05-win32-diagnose0html

And of course Clearwin, Virtual Common, .NET etc...


Last edited by DanRRight on Sun Jun 22, 2008 3:34 am; edited 4 times in total
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sun Jun 22, 2008 3:08 am    Post subject: Re: Reply with quote

PaulLaidler wrote:
You are obviously not a fan of FTN95 but I wonder why you are making unsubstantiated statements like "the Salford FTN95 reorders statements when its optimization is switched on"


Paul, I wrote in front of this sentene one importantword: "Seems..." and then what you wrote is correct. Means, I guess or we guess.

In return, I have to note the unsubstantiated statement "You are obviously not a fan of FTN95...". I am really sorry if you understood me this only way. I use ***only*** Salford/Silverfrost since probably 1988, two decades, and like it more than any other compiler. I used great DOS/DBOS version FTN77, went through hell with buggy FTN90 and was mostly happy with FTN95 recommending it to anyone. But I like it to be even better by pointing not only on its strengths, but also on its weaknesses. That helps to make substantial workarounds and be 100% happy with FTN95.
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Sun Jun 22, 2008 11:05 pm    Post subject: Reply with quote

I'm confused. Doesn't optimisation reorder statements? I thought that you needed an Assembler to have exact translation, so even non-optimised compilation must reorder statements to a degree.

I think Dan is after a Holy Grail - ClearWin+ and the clear advantages of fast compilation, excellent diagnostics, and some of the excellent add-ons of FTN95 (which it has), together with "best of the pack" execution speed (with benchmarks), 64-bit code (to allow the use of >3Gb RAM) and making use of all cpu cores through multi-threading - which it doesn't. Multi-threading, I will remind us, was present in the DBOS FTN77 - although it was a fat lot of good (i.e. to translate into US English: not much use) with a single core cpu, and DBOS FTN77 was certainly one of the compilers at that time that produced the fastest runtime.

Eddie
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group