replica nfl jerseysreplica nfl jerseyssoccer jerseyreplica nfl jerseys forums.silverfrost.com :: View topic - FTN95 Version 8.3 - Some Initial Observations
forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

FTN95 Version 8.3 - Some Initial Observations
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 1899

PostPosted: Fri Mar 30, 2018 11:47 am    Post subject: Reply with quote

It is to be expected that /checkmate would force allocation of memory at the outset. Uninitialized variables, including some big arrays, have to be filled with special values so that, when the same variables are used later, their values can be compared with the special value to detect whether they have been initialized.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Fri Mar 30, 2018 12:40 pm    Post subject: Reply with quote

ALLOCATE for 32 bit /CHECK uses its own memory allocation based on existing blocks of VirtualAlloc memory and sets to the "undefined" state when called.

ALLOCATE for 64 bit /CHECK uses GlobalAlloc/HeapAlloc and sets to the "undefined" state when called.
Back to top
View user's profile Send private message AIM Address
wahorger



Joined: 13 Oct 2014
Posts: 1257
Location: Morrison, CO, USA

PostPosted: Sun Apr 01, 2018 12:13 am    Post subject: Reply with quote

Thanks for the explanation, Paul.
Back to top
View user's profile Send private message Visit poster's website
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Mon Apr 02, 2018 12:28 pm    Post subject: Reply with quote

Please go to the following post regarding new DLLs...

http://forums.silverfrost.com/viewtopic.php?p=24394#24394
Back to top
View user's profile Send private message AIM Address
dpannhorst



Joined: 29 Aug 2005
Posts: 165
Location: Berlin, Germany

PostPosted: Fri Apr 06, 2018 7:03 pm    Post subject: Reply with quote

The dropbox link to new dlls leads to an error.

Detlef Pannhorst
Back to top
View user's profile Send private message Visit poster's website
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Fri Apr 06, 2018 7:08 pm    Post subject: Reply with quote

Yes. The above link explains why the download has been removed.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Mon Apr 09, 2018 7:38 am    Post subject: Re: Reply with quote

wahorger wrote:
I am observing that V8.30.0 is much faster at 32-bit compiling than 8.20.0.


All version were always superfast like no other compiler, i automatically keep compilation speed results from 1999. Where other compilers spend 3 min FTN95 compiles 3 seconds. That takes place many times per day. And since during program development (in my case this is vast majority of spent time) the compilation and debugging speed are key, they are way more important then run time. Usually people say that they chose Fortran for its run speed. But this without parallelisation and supercomputers is an absurd. In reality if they use PC they lose most of the time for development. Lose just 3 seconds per day and at the end of life you will lose 24 hours. Actually we lose many many hours per day, how much this end up for life is even scary to pronounce.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Mon Apr 09, 2018 10:59 am    Post subject: Reply with quote

Please go to the following post regarding new DLLs.

http://forums.silverfrost.com/viewtopic.php?p=24467#24467
Back to top
View user's profile Send private message AIM Address
JohnCampbell



Joined: 16 Feb 2006
Posts: 2615
Location: Sydney

PostPosted: Mon Apr 09, 2018 1:29 pm    Post subject: Reply with quote

Dan,

Ver 8.3 provides more multi-threading options.
I am looking to see what I can achieve and will update shortly.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Mon Apr 09, 2018 10:09 pm    Post subject: Reply with quote

Interesting, would like to look, but i'm too busy now to experiment. Meantime for you, John, Paul and those who already started i have few questions about this parallel method:

1) What's new here compared to previous method which allowed to start parallel threads?

2) Was the LOCK mechanism implemented like in FTN95 for NET allowing to print without danger of threads crash? This is the big problem during debug because of a lot of I/O happen at this time

3) How fast is this method compared to parallel example for NET i posted few years back (see the link below, use my last demo) which showed amazing unexplainable till now more then 6.2x speedup on typical 4-core 8-thread processors ?

4) Anyone already bought new cheap 8, 16 or even 32-core AMD processors? How fast is the method on AMD vs Intel

Here is URL for FTN95 for NET case.
http://forums.silverfrost.com/viewtopic.php?t=2534&highlight=net+multithreading
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2615
Location: Sydney

PostPosted: Tue Apr 10, 2018 3:47 am    Post subject: Reply with quote

Dan,

Interesting questions, but I will try to answer a few of my own first.

Why try using AMD when intel are so cheap ?
I just bought an i7-8700K which has 6 cores for 12 threads. The important feature is it supports 2666 MHz memory, which provides a greater memory transfer bandwidth. It gives noticeable improvement in comparison to i7-4790K for multi-thread equation solution of 300 Mb skyline matrix for 12 threads. The 4790K (4 cores, 8 threads) looses efficiency above 4 threads when hyper-threading, which I attribute to the slower 1600 MHz memory.

My use of multi-threading is fairly basic. The FTN95 approach does require some care when managing private variables. My approach is to immediately call a routine, which then allocates local variables for all private variables, while shared arrays are allocated before thread initiation to provide thread based accumulators. (even the thread ID must be private !) I am now trying to emulate SCHEDULE(DYNAMIC) and CRITICAL.
FTN95 threading could offer a lot of potential, as opening an OMP PARALLEL region can take 30,000 processor cycles on other compilers, which kills small load threads.
Still have some work to complete this approach,

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Thu Apr 12, 2018 12:59 am    Post subject: Reply with quote

With computers the minimal unit of measuring is factor of 2. Two computers within factor of 2 of performance are essentially equal. Otherwise if one thinks 20% difference is a lot then buy new computer with each and every increase by 20% (which translates to every few months). This will explain my questions below.

Interesting to test and find what is better for large scale linear algebra

- double amount or cores or
- double speed of RAM or
- quad channel vs dual channel memory architecture or
- double cache size
- double harddrives speed ?

Assuming the RAM size is not a problem last question is also not a problem. But there exist 4300MHz Corsair DDR4 RAM modules which are almost factor of 2 faster then typical 1.6-2.4 MHz ones. There exist 20-30 MB caches versus typical 9-12MB. There exist quad channel memory transfer speeds etc... What it is mostly bound to when matrix size is very large?
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2615
Location: Sydney

PostPosted: Thu Apr 12, 2018 1:40 am    Post subject: Reply with quote

Dan,

All these are significant, as they are related.
I find the bottleneck is with transfers between memory and cache.
So speed of RAM and cache size are the most significant.

I am not familiar with "quad channel vs dual channel memory architecture" so if it affects transfer rates then that would be related.

"double amount of cores" would change the number of threads (?) so would be significant.

The other main significance is modifying the calculation to minimise the memory to cache transfers, ie cache smart algorithm.

What is interesting is that performance is less affected by the processor clock rate, as the bottleneck is memory <> cache transfers.

What I am still trying to understand is how to use separate memory pages for each thread, as sharing pages between threads can affect memory coherence.
("Memory Coherence" is my latest unknown. The difficulty is that if you don't understand how this affects performance, it is difficult to construct a test that identifies the problem, especially demonstrating how to run without the problem.)

Has anyone experienced the improvement in MATMUL performance in gFortran Ver 7+ for large matrices? They have changed the algorithm and it works on 4x4 sub-matrices and achieves performance on a single thread that I achieve using 4 threads ! Their approach is cache smart + vector instructions, achieving surprising single thread performance, demonstrating there is much to learn about managing the multi-level cache architecture.

still much to learn !
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1899

PostPosted: Thu Apr 12, 2018 2:33 am    Post subject: Reply with quote

There was an interesting contribution by "Repeat Offender" in the Intel Fortran forum, in which he showed that doing arithmetic using AVX instructions instead of a straight table lookup enabled a program to run 400-X faster. The chosen task: converting the text of an e-bible, about 4.5 MB long, to upper case.

See https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/757222#comment-1918919 . You may have to sign in to make his post visible.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Sat Apr 14, 2018 8:56 am    Post subject: Reply with quote

No, Intel does not need registering. By the way their forums allow to post much larger source code sizes. And also the forum design looks more modern.

If our linear algebra is actually memory bandwidth bound then AVX may not influence performance much. What good to check is if memory architecture matters or not. Today AMD announced their second iteration of 8 core 4 memory channel processors at even cheaper price $330. Also rumors are flying about 48 and 64 core AMD chips with 256MB cache and 8 channel memory architecture.

For memory bound tasks the optimum processor could be with any low MHZ, just as many cores and many memory channels as possible.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group