Silverfrost Forums

Welcome to our forums

AMD vs Intel. Fight!

7 Jan 2020 1:56 #24828

Ian Cutress from Anandtech is preparing test of new AMD 64-core processor, see

'AMD’s 64-Core Threadripper 3990X, only $3990! Coming February 7th'

Let's prepare for him the test which we have tried here few years back where we used linear algebra AX=B dense matrix as an example (used MKL/Pardiso and LAIPE parallel libraries).

Now from ver 19 i think Intel MKL/Pardiso libraries support AVX512 and if AMD will beat Intel processors using Intel's own software the game for Intel with its 10x overpriced processors is over. AMD has huge chance doing that because its large caches even though it supports only SSE256. And the 8-channel memory on another 64core AMD cheaper than Intel processor called EPYC 7002 has chance to beat Intel even more

Here is an example mecej4 wrote for MKL library back then. We just need to compile it with new MKL library on any compiler which will support AVX512 and sent it to Ian

 implicit none 
 integer :: i,j,neq,nrhs=1,lda,ldb, info 
 real*8,allocatable :: A(:,:),b(:) 
 integer, allocatable :: piv(:) 
 Integer count_0, count_1, count_rate, count_max 
 

 do neq=1000,21000,5000 
    lda=neq; ldb=neq 
    allocate(A(neq,neq),b(neq),piv(neq)) 
    call random_number(A) 
    call random_number(b) 
    Call system_clock(count_0, count_rate, count_max) 
    CALL dgesv (nEq,nrhs,A,ldA,piv, b, ldb, info) 
    Call system_clock(count_1, count_rate, count_max) 
    Write (*, '(1x,A,i6,A,2x,F8.3,A)') 'nEqu = ',nEq,' ', & 
         dble(count_1-count_0)/count_rate, ' s' 
    deallocate(A,b,piv) 

 end do 
 end
21 Apr 2020 7:10 #25263

Dan,

I am looking at the Ryzen 9 3900X as a comparison to the i9-10920X or i9-9900KS. Hopefully I might be able to test them to see how single thread and multi-thread performance compare. Yet to fine a test pc.

22 Apr 2020 10:25 #25289

Yes, would be nice if you play with this. Please use Intel MKL ver.19 or later. It has optimization for AWX512.

My MKL is older, i do not know if Intel distributing MKL for free or not, before for some period of time MKL was free.

After you compile the code with new MKL with and without AVX512 please send it to me, i will try to ask Anandtech dot com guys to run it on different processors which have AVX512.

I do not know if it is possible to activate AVX in AMD processors, they do not have AVX512, but some have AVX256 i think. Not all Intel processors have AVX512 too, but AVX could be a major improvement point with the linear algebra

12 Nov 2020 2:08 #26593

Dan,

AMD appear to have hit the lead for desktop pc's. I will be trying to buy a Ryzen 5900X, possibly 5950X, when it is generally available. Threadripper looks to be a price point too high, although the extra memory channels may be my next step after this.

OpenMP with 24 threads with large arrays (GB's) looks to be a practical limitation for me at the moment. I am struggling to get AVX2 to work effectively and it is annoying me that I can't approach MKL quoted performance for large arrays. Either marketing spin or my ignorance ! (I would expect AVX512 will only work when arrays are in L1 cache, but again probably my ignorance !)

22 Nov 2020 9:48 #26629

I do not know why, coronavirus or something else, but you can not buy recently anything from AMD or NVIDIA. Two months after releasing new graphics cards from NVIDIA 3080 and 3090, for example, which will be particularly good with OpenGL, and the only which work with 8k monitors, they are still shown as 'sold out' in all shops. You can buy them on eBay but at 2x price. Hope they will appear at least on Black Friday this week. Do you have Black Fridays in Australia?

22 Nov 2020 10:43 #26630

Dan,

Isn't the answer in your post - people have bought all the stock knowing that they can sell it on for 100% profit. Who can afford to? Someone with a lot of money. What's the betting that the funding comes from a rival with an inferior product?

Eddie

23 Nov 2020 12:32 #26631

Good news is that AMD just made graphics card superior to NVIDIA. And INTEL last week showed samples of new processor which will beat AMD scheduled for first quarter of 2021. May be this will cool the craze with the reselling everything on Ebay.

24 Nov 2020 2:35 #26632

Dan,

I am hoping to get a Zen 3 Ryzen when stocks become available. Hopefully it will perform well as claimed for large vector calculations. The big problem is availability ! I have read some of the Intel 11 Gen Rocket Lake info, but initially they appear to be limited to 8 cores. They may not be able to achieve 'cool' as they are 14nm. Not sure I believe all their claims.

24 Nov 2020 4:29 #26633

How many cores does the Zen 3 have?

24 Nov 2020 6:05 #26634

Robert,

Zen3 is a family. The 5950 has 16 cores, 32 threads, the 5900 has 12 cores, 24 threads, and so on down to the 5600 with 6 cores and 12 threads.

It seems that we are more likely to see more cores than faster clockspeeds.

Eddie

24 Nov 2020 6:39 #26635

And which one are you hoping to get?

25 Nov 2020 12:58 #26636

Robert,

I now have a i7-8700k (6Core,12thread) and am hoping to get a 5900X (12Core,24thread). At 10 threads, I am seeing memory bottlenecking for large arrays, with reduced AVX efficiency. It will be interesting to see if more cache helps. Need to get a new cpu first, as not generally available !

25 Nov 2020 8:12 #26637

64MB of L3 cache - more memory than the whole of Salford Software when I joined!

25 Nov 2020 10:44 #26638

Interesting. You were lucky. (Cue Monty Python sketch here.) My first experience of a 'decent' mainframe was an Elliot/NCR 4120 which had 24k of 24bit words. I later used an IBM 1130 which had less, and some CDC machines which had 64k of 60 bit words. As time went by, I had access to larger machines, culminating in a terminal-access VAX with 512k bytes. After that, I used PCs starting with an Apricot which starting from 256k (of about half usable) I ramped up to 896k (no 640k limit) and then a 286 with 960k usable. When I got to a 386, DBOS screwed up my machine because it was incompatible with Apricot's own DOS extender. By the time I had a 486 machine with 16Mb of RAM, nothing needed to be overlaid, and that was the first sweet spot, giving more memory than many mainframes of the time. Currently, I have 32Gb, 16Gb, and 8Gb machines. All of which are overkill for 32 bit FTN95, which runs happily on a 2Gb laptop, although only with 32 bit Windows 10. The second sweet spot occurred initially at 1Gb and Windows XP/7 because around then, say nearly 20 years ago, ClearWin+ became easy to use - when it no longer required DBOS as well. Frankly, while using multiple cores in an FTN95 program is difficult, it's only power users like JohnC and Dan who need those multicore machines. I prefer the better roundoff precision of the x87 to the speed improvements with SSE (if there are any for what I do) and once software runs with no perceptible lag, further speed improvements are a bit pointless (JC and DR excepted, of course).

Eddie

25 Nov 2020 6:49 #26640

64MB of L3 cache - more memory than the whole of Salford Software when I joined!

I was referring to a year after FTN77/386 had shipped!

14 Jan 2021 6:19 #26896

Dan,

I bought a Ryzen 5900X last Dec-20 and it is much faster than my previous i7-8700K. Actually it’s between 50% to 100% faster for my FE analysis, depending on the type of calculation. I first got a 5900X + 64GB 3600MHz memory, but it kept crashing on multi-thread calcs. Changed to 3200MHz memory and it now doesn't crash. Presumably the quality of the silicon in the 3600MHz memory was a problem. I am not sure of the silicon quality of the 5900X !!

For my large array calcs using more threads, but only 2 memory channels is a significant bottleneck. I don't get much better performance above threads = cores, (which is similar with the 8700K)

I tried to find an easy problem to define and apply OpenMP! I have been doing testing of large matrix multiply using my developed code: C[15000,12000] = A[15000,11000] x B[11000,12000] (see equation.com), where partitioning is essential to reduce the memory<>cache bottleneck. (Vectors must be in cache for AVX to work efficiently and there are 3 levels of cache!) My main measure of performance is to calculate the number of floating point multiplies per second, as GFLOPS (10^9 flop/Sec). My coding approaches at partitioning produce 50 Gflop/s for i7-8700 and 100 Gflop/s for 5900X. These are significantly slower than MKL - DGEMM claimed performance (250+ Gflop/s for similar i5 processors), that I cannot approach (even allowing for MKL benchmarks count additions) (Equation.com report 22 Gflop/s for Opteron and 9.7 Gflop/s for Xeon which is slow)

Interesting that the Ryzen shows significant variability in gflops vs threads for my coding approaches, especially as threads exceeds cores. I7-8700 similarly stalls as threads exceed cores. This is an area I need to investigate further. My next processor will have more memory channels. OpenMP with large arrays is not an easy coding problem. (large is array size >> cache size) I will try to post some results when I can better describe the problem. You can't just buy a different processor and use it. There is lots of tuning to do.

17 Jan 2021 5:53 #26919

John, So in summary you have got twice more cores inside Ryzen and 50 to 100% increase vs Intel ? Does this mean that the Ryzen single core performance is around the same as with Intel ?

Unfortunately i do not have anyone nearby with larger memory channel PCs. I have access to 10000 core Linux supercomputer which uses older Intel 12 core Xeon processors which would be not so interesting to test, and the code we use is written in C. Fortran version 19 with AVX should be there too but there is no one to ask how to use it, the good sysadmin left the team.

The only person i know by contacting him few years ago who has broad access to all world existing processors and who is also interested to test them is Ian Cutress from Anandtech. The UK guy by the way, former scientist, nice and easy going person, at least he was in the past before he started interviewing all the top CEOs in the IT industry. Try to convince him to run the test on 4, 6 and 8 memory channel computers. His own 3D particle moving code got huge benefits from AVX512. Plus he knew the former engineer at Intel who adjusted his code with AVX to get 3-4 even 5x increase in performance vs no-AVX. If he will find that some processors favor significantly cache size, memory channels or AVX with such important task as linear algebra i am sure there will be huge buzz in the industry. He touted his AVX speed increase with Intel processors vs AMD which do not have AVX512 last few years, and Intel clearly liked this. When we implemented in our codes AVX512 though the increase in performance was just 20% or less.

19 Jan 2021 4:29 #26928

Quoted from DanRRight Does this mean that the Ryzen single core performance is around the same as with Intel ?

I think that is too general a question. Ryzen is probably better, but I am comparing to Intel 8th gen.

I am finding Ryzen 5900X to be significantly faster than i7-8700K for the test cases I am considering. However there is considerable variability in the Ryzen performance.

My test cases involve large arrays/vectors; 100Mb to 3.5Gb. They appear to be too big to identify a benefit from 2x cache size (which I was hoping would be a plus) At present (still in the learning phase), the variability in Ryzen performance appears to be due to a combination of variability in boost frequency and higher temperature with many threads. (high GFLOP matrix multiply is a compute intensive calculation) I have selected a Nocuta D15 air cooler, while a higher capacity water cooler might mitigate this. (I did not expect this to be as significnt a problem with 7nm silicon)

My other test case with an actual FEA calculation does show at least 50% improvement vs 8700, which is a plus for Ryzen.

19 Jan 2021 7:20 #26929

Noctua is good air cooler, one of the best, but i still recommend to use reliable good company water cooler.

21 Jan 2021 10:32 #26949

JC,

Is that a self-build, or a commercial pre-built system? If you built it, what case and fans did you use? A system built into a tower case shouldn't have thermal throttling.

Eddie

Please login to reply.