forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

FTN95 Version 8.3 - Some Initial Observations
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 1872
Location: South Pole, Antarctica

PostPosted: Thu Apr 12, 2018 12:59 am    Post subject: Reply with quote

With computers the minimal unit of measuring is factor of 2. Two computers within factor of 2 of performance are essentially equal. Otherwise if one thinks 20% difference is a lot then buy new computer with each and every increase by 20% (which translates to every few months). This will explain my questions below.

Interesting to test and find what is better for large scale linear algebra

- double amount or cores or
- double speed of RAM or
- quad channel vs dual channel memory architecture or
- double cache size
- double harddrives speed ?

Assuming the RAM size is not a problem last question is also not a problem. But there exist 4300MHz Corsair DDR4 RAM modules which are almost factor of 2 faster then typical 1.6-2.4 MHz ones. There exist 20-30 MB caches versus typical 9-12MB. There exist quad channel memory transfer speeds etc... What it is mostly bound to when matrix size is very large?
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1979
Location: Sydney

PostPosted: Thu Apr 12, 2018 1:40 am    Post subject: Reply with quote

Dan,

All these are significant, as they are related.
I find the bottleneck is with transfers between memory and cache.
So speed of RAM and cache size are the most significant.

I am not familiar with "quad channel vs dual channel memory architecture" so if it affects transfer rates then that would be related.

"double amount of cores" would change the number of threads (?) so would be significant.

The other main significance is modifying the calculation to minimise the memory to cache transfers, ie cache smart algorithm.

What is interesting is that performance is less affected by the processor clock rate, as the bottleneck is memory <> cache transfers.

What I am still trying to understand is how to use separate memory pages for each thread, as sharing pages between threads can affect memory coherence.
("Memory Coherence" is my latest unknown. The difficulty is that if you don't understand how this affects performance, it is difficult to construct a test that identifies the problem, especially demonstrating how to run without the problem.)

Has anyone experienced the improvement in MATMUL performance in gFortran Ver 7+ for large matrices? They have changed the algorithm and it works on 4x4 sub-matrices and achieves performance on a single thread that I achieve using 4 threads ! Their approach is cache smart + vector instructions, achieving surprising single thread performance, demonstrating there is much to learn about managing the multi-level cache architecture.

still much to learn !
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 978

PostPosted: Thu Apr 12, 2018 2:33 am    Post subject: Reply with quote

There was an interesting contribution by "Repeat Offender" in the Intel Fortran forum, in which he showed that doing arithmetic using AVX instructions instead of a straight table lookup enabled a program to run 400-X faster. The chosen task: converting the text of an e-bible, about 4.5 MB long, to upper case.

See https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/757222#comment-1918919 . You may have to sign in to make his post visible.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1872
Location: South Pole, Antarctica

PostPosted: Sat Apr 14, 2018 8:56 am    Post subject: Reply with quote

No, Intel does not need registering. By the way their forums allow to post much larger source code sizes. And also the forum design looks more modern.

If our linear algebra is actually memory bandwidth bound then AVX may not influence performance much. What good to check is if memory architecture matters or not. Today AMD announced their second iteration of 8 core 4 memory channel processors at even cheaper price $330. Also rumors are flying about 48 and 64 core AMD chips with 256MB cache and 8 channel memory architecture.

For memory bound tasks the optimum processor could be with any low MHZ, just as many cores and many memory channels as possible.
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 1844
Location: Yateley, Hants, UK

PostPosted: Wed Apr 18, 2018 6:54 pm    Post subject: Reply with quote

I had a go with 8.30pe and didn't think it was any faster than 8.10. Perhaps the /64 is.

What it does do is to put up the PE nag screen with a /LGO compilation, which 8.10 didn't.

Answer to John about why buy AMD - it makes sense if you are buying 1000, most of which aren't used for number-crunching. Also makes sense if you are using your own money. Makes sense if your apps work fine on a low-spec cpu. Equally makes sense if your apps depend heavily on the GPU rather than the cpu.

Eddie
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1872
Location: South Pole, Antarctica

PostPosted: Fri Apr 20, 2018 11:24 am    Post subject: Reply with quote

Adding to Eddies comment about AMD: this specific batch of processors - Ryzen - became much faster for the same money exactly for multithreaded tasks.
.
Read today's Anandtech review of new batch of Ryzen+ processors.

One potential huge win of AMD here could be for linear algebra due to 4 channel memory architecture for Ryzen / Ryzen+ and even 8 channel for 32-core Threadripper. And factor of 2 difference in price is substantial and very favorable for AMD. Some server Intel processors are probably have 50x the production cost if compare to mobile processors. Latter ones are often made with even more advanced technology, with more transistors on chip and also require to build expensive multi-billion dollar factories but sell for $5 per core.

For single threaded tasks (like compiling and running simple codes) Intel is faster but this time delta is so marginal that nobody should care. You can additionally win around 5% with Intel by overclocking its processors while AMD being on the heat dissipation limit (it is hard to fight with the behemoths like Intel) overclocks not much.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1979
Location: Sydney

PostPosted: Fri Apr 20, 2018 1:30 pm    Post subject: Reply with quote

Dan,

The performance is certainly interesting, although i7-8700k @ $359 vs Ryzen 7 2700X @ $329 is not "factor of 2". I'd check the precision of the processor you used for the X2 result.

Also the Ryzen review I found used 3,400MHz DDR4 ram, which may remove some price advantage.

I don't know the benchmark test you reference, but 2.541 billion ops per second might be low. I am getting 28 gflops for linear equation reduction on 8700k using 12 threads (s = s + a*x is 1 op).
(Erwin, I am now reducing your 23 GByte Atlanta matrix in under 20 minutes)
The algorithms I use would struggle to utilise 32 threads, while controlling stack overflows could be an issue. There are more problems than threads !

Your chart certainly implies that AMD is now comparable to Intel, which is a good thing.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1872
Location: South Pole, Antarctica

PostPosted: Fri Apr 20, 2018 3:56 pm    Post subject: Reply with quote

John,
Compare apple to apple, 8 core to 8 core, $600 vs $330. Don't forget, Ryzen can handle 4 channel RAM giving twice larger memory bandwidth then Intel 8700

For 16core: AMD Ryzen Threadripper 1950X vs Intel 7860, $950 vs $1550.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group