forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Tablet almost beating oveclocked desktop CPU

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Tue Feb 10, 2015 8:28 am    Post subject: Tablet almost beating oveclocked desktop CPU Reply with quote

The latest Intel processors developed for the tablets and large phones have TDP 4.5W and still almost beat 150W (if overclocked) desktop processors. Here are results of the same linear algebra tests we have wrote here a year ago

First table shows the times for the matrix algebra tests you have seen here before which uses almost the fastest processor on the planet a year ago and which is still probably near the top because of being overclocked to 4.5GHz (fastest today are close to 4.9-5.0 GHz with water cooling, fastest in terms of single core performance)

Code:

i7 4770k 4.5GHz overclocked, 4 cores/8 threads
 matrix size --> 1000    2000    3000    4000
 --------------------------------------------
 Dense/Block     2.22    30.4    127.    297.
 Dense/Block Tr. 0.20    2.06    7.36    17.5
 SSE             0.12    1.81    6.70    16.2
 LAIPE           0.09    0.75    2.44    5.90


And here is Lenovo Yoga 3 Pro thinnest and lightest convertible 13.3" tablet-laptop

Code:

i7 5Y70 1.1-2.6 GHz (turbo) 2 cores/4 threads, Lenovo Yoga3 Pro tablet
 matrix size --> 1000    2000    3000    4000    5000    6000
 --------------------------------------------------------------
 Dense/Block     1.9     23.5    112     335.    xxxx    xxxx
 Dense/Block Tr. 0.94    7.5     26.7    65.7    128.    xxxx
 SSE             0.27    2.9     8.8     20.7    42.9    73.7
 LAIPE           0.2     2.1     7.0     22.1    50.1    90.4


And look at that, DavidB's SSE method is beating parallel LAIPE! Possibly the thermal throttling is the reason why two cores can not work for a long time in parallel. The CPUZ shows that multiplier gradualy drops from 26 to 20 in SSE case (means 2.6GHz to 2.0GHz) while in LAIPE one it drops from 26 to 16.

By the way, Intel planned to add more instructions to its extended sets like fused multiply-add
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Fri Feb 13, 2015 7:22 pm    Post subject: Reply with quote

Some interesting results there. I think that fused multiply add us already implemented in Intel's Haswell processors. Of course whether this is used depends on the compiler.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Sat Feb 14, 2015 3:21 am    Post subject: Reply with quote

David, can you evaluate if it is needed to add FMA to your assembler library to benefit linear algebra? The first impression is that it might further improve the efficiency of the code

And Silverfrost to comment if this will be added to the compiler?

SSE is really a big thing as you see from these tests.The LAIPE developer was working on this too as he informed me a year ago. But i did not check yet if all was done.

John Campbell also was researching AWS. Any progress, John?
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Feb 14, 2015 5:03 am    Post subject: Reply with quote

Dan,

I thought that the fused multiply and add was introduced in SSE2. (Dot_Product is basically multiply and accumulate)

We should change the test program reporting to also report Gflops, as that is a useful comparison of different size problems. (just need to define what is a floating point "op")

I have recently been trying to use AVX instructions and combine with !$OMP. I now have a good parallel skyline solver running on i5 and i7 CPU's.
I have not been very successful is showing a significant improvement in AVX in comparison to SSE2.
My latest tests are identifying the significance of cache size and trying to minimise the frequency that the cache is updated. If the vectors are not in the cache, then AVX does not appear to work well. There are lots of possible reasons for this. I just need to be able to differentiate between the causes and the associations.

It would certainly be good if some of these instructions were available in FTN95, even if it was in a restricted syntax DOT_PRODUCT and a few other basic vector calculation functions: a FTN95 vector library!! This could give us the flexibility to improve out FTN95 performance, while enjoying the power of FTN95 error checking.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Sun Feb 15, 2015 4:58 am    Post subject: Reply with quote

Yes AMD has it longer but with Intel only latest Haswell and Broadwell 22nm and 14nm chips started to have it since mid 2013. Some server chips planned to get FMA just last year. On Anandtech Ian Cutress discussed FMA3/4 last year and had wishes to test acceleration but I did not see anything since then.

Good to find original Linpack routine which was used to measure flops in the past. Just for comparison. Interesting how many 8-core Cray-2 our laptops, tablets and cellphones have.

Why haven't you try parallel LAIPE, it had anomalous speed boost on AMD chips exactly with skyline? You can download his test for multiple compilers and it uses skyline


Last edited by DanRRight on Mon Feb 16, 2015 5:20 am; edited 2 times in total
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Sun Feb 15, 2015 7:18 pm    Post subject: Re: Reply with quote

DanRRight wrote:
David, can you evaluate if it is needed to add FMA to your assembler library to benefit linear algebra?


Yes this would be helpful. But it would be difficult for Silverfrost to keep the assembler up to date to include such new instructions. It is already quite a bit behind what is possible with current chips. The last time Paul looked it seemed quite a bit of work to add new instructions.

I don't even know if the assembler facility is included in the new compiler (32 bit/64 bit) when it comes out. We will have to wait and see.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Sat Mar 19, 2016 2:37 am    Post subject: Reply with quote

Small update.

Installed more DDR3 RAM on desktop computer and upgraded RAM from 1600 to 2400MHz (from 9-9-9-27 to 11-13-13-31). That gave almost no effect on any programs besides the ones using SSE which now runs faster - before on 4000 equations run lasted 16.2 seconds now 12.97 s.

And by some reason new versions of code affected transposed matrix solver which started to run way slower - before was 17.5 s, now 32.58s. Changing RAM did not affect it. Same effect was on the Lenovo tablet where transposed matrix also was way slower on new software. We don't use this solver anyway, we use parallel solver LAIPE and it is insensitive to the RAM

Code:
 i7 4770k 4.5GHz overclocked, 4 cores/8 threads, 2400MHz SDRAM
 matrix size --> 1000    2000    3000    4000    5000    6000
 ------------------------------------------------------------
 Gauss Regular   2.23   30.09  126.11  294.60   xxxxx  xxxxxx
 Gauss Transp    0.51    4.11   13.78   32.58   63.32  109.29
 Gauss SSE       0.11    1.41    5.28   12.97   25.29   43.46
 LAIPE           0.09    0.73    2.37    5.85   11.03   19.44


Anyone has 3200MHz or faster DDR4 RAM on similar type of processors?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group