|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
DanRRight
Joined: 10 Mar 2008 Posts: 2867 Location: South Pole, Antarctica
|
Posted: Sun Sep 29, 2024 8:46 pm Post subject: |
|
|
This year there will be three good news to build powerful workstations or PC for parallel simulations:
- NVIDIA RTX 5090, it is gaming GPU, it has some restrictions, but still works with CUDA too. Rumors are that it will have 32GB RAM. No NVlink on modern cards anymore but it's on 2-4x faster than previous gen GPUs PCIe 5.0. PCIe6 is already ready and PCIe7 will be perfect even without NVLink
- 128 core Intel Xeon 6980P and
- 128 and 192 core AMD "Turin" processors.
Intel's monster processor already came last week, its spec is really impressive (do not know about price tag but i am sure it will be an arm and leg. And kidney). And 500 Watt consumption.
AMD should be much cheaper. But also 500 W. Buy two (or dig out in California city dump ) and you'll get 1 kW. This is the trend.
The RTX 5090 will consume even more, 600W.
Plan to buy good 2-2.5 kW power supply working on 240W power grid, the 120V one like in our Antarctica will already not fit their power envelop |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2867 Location: South Pole, Antarctica
|
Posted: Thu Oct 03, 2024 12:22 am Post subject: |
|
|
Here is good picture which explains why this compiler must support GPUs. This diagram shows how works typical modern supercomputer, this was from the most powerful one called Frontier but in essence all is almost the same for other servers or even HPC workstations.
On their baseboards they have one or two server CPU chips connected to 8 GPUs via fast links. The CPU essentially is just the input/output controller for the GPUs. There are 10,000 such baseboards in supercomputers like Frontier and typically few or even just one in high performance workstations. Even our PC with usual graphics card basically works same way. To save on cost of fast networking inside the baseboard, on workstations often is enough to use standard PCIe bus same way like with regular graphics cards. Little more complex motherboards can use NVIDIA's NVLink which has 300MB/s speed, or a bit more modern ones have special faster switches allowing more GPUs work in parallel.
You can see why supercomputers with GPUs are faster than CPU based ones: first, each GPU is faster than CPU and second, their amount is 8x larger than the number of CPUs.
So essentially all modern supercomputers are GPU supercomputers and this will be the trend for some time due to the fact that GPUs are also specifically good for AI. The reason for that is that GPUs besides 64bit arithmetic also have 32bit, 16bit and 8bit arithmetic and hence each time you use less bits you obviously get the speed doubled and AI often needs just the 8bit one and sometimes even 4bit is enough.
With CPUs though a decade or two ago what happened was unthinkable: Intel killed native support for single precision 32bit arithmetic and started to use just the double precision 64bit one and then truncating the final result to 32bit. I used so called mixed precision before when most of operations were in 32bit and only minor amount with 64bit and even larger precisions. Decently, 64bit is easier to use, no problems with the overflows or denormal numbers. But the price for that is factor of 2 speed decline.
What is good with the new processors is that by adding more cores they become as fast as all these incarnations of previous generations of GPUs which are still hell expensive. So hopefully all of them will drop in prices. Why we will need GPU if multi-core CPUs becoming as powerful as GPUs? Because besides monopoly in GPU NVIDIA also overjumped everyone with the fast interconnect. It is easy to add 2,3,...,8...16 GPUs to the existing system and improve with that the performance accordingly while you will not find a single manufacturer which makes more than dual-CPU motherboards and you will not find how to connect even two motherboards to make a parallel system. Hence with the CPUs you are screwed if you will try to get more performance
One thing is specifically good for this compiler. With it supporting CUDA and GPUs the speed of CPU will have no difference. All will depend on GPU and not on that CPU is old, slow or does not support multuthreading. Win-win for FTN95 !
Last edited by DanRRight on Thu Oct 03, 2024 6:17 pm; edited 4 times in total |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8019 Location: Salford, UK
|
Posted: Thu Oct 03, 2024 7:34 am Post subject: |
|
|
There are various ways in which we (Silverfrost) can deploy our resources:
1) Bug fixing.
2) Implementing new features in the Fortran Standards.
3) Implementing on operating systems other than Windows.
4) Making use of new technology such as parallel processors.
We depend largely on user feedback in order to prioritise these tasks.
Almost all development work is hugely expensive in time and resources.
At some point I plan to investigate the potential for further work on parallel processing.
In the mean time users should note that parallel processing is not simply a task for the compiler. The programmer must create and use parallel algorithms. Such algorithms provide for parallel action whilst selectively locking memory that is used on more than one stream.
The compiler already allows for multiple threads and selective locking and this represents one kind of parallel processing.
Another approach is available with 64 bit FTN95 and is described in the document notes_on_parallel_processing.txt that is typically located in C:\Program Files (x86)\Silverfrost\FTN95\doc.
In any case, you can assume that a parallel processing approach will require a significant investment of time and resources from the programmer. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2867 Location: South Pole, Antarctica
|
Posted: Sat Oct 05, 2024 11:10 pm Post subject: |
|
|
I think asking users of this compiler at this point to prioritize the trends of future development is like to fall into closed loop. With time the users who are left or attracted here became those who do not need parallel simulations or speed, including mostly programmers of older generation. Asking them about their priorities is like slowly going into decline and dead end. We discussed this many times already, that just the name of this compiler invites only retirees, it has to be changed as it supports already many never Standard's features
I can not propose any magical solution where to find funds and resources for that. I'd try for example to investigate how gFortran which was like a dumb toy for kindergarten in 1990 while Salford/Silverfrost was a super-pro-ultra with even FTN77 doing multithreading became so prolific. What is their business model? Here is the list of gFortran compilers in my Linux Mint third party Software Manager
All of these are different flavors and versions of gFortrans for different processors. Than means 100s of independent people worked on them. In 30 years FTN77/FTN95 departed too far from the mainstream in parallel simulations and run speed. All other Fortran compilers support MPI, CUDA and some OpenMP. Modern, older and even already dead Fortran compilers made them a defacto standard. Besides that they all merged in many other features so that for example this popular supercomputer PIC code
https://epochpic.github.io/documentation/basic_usage/compiling.html
can be compiled and run without any changes on these compilers
COMPILER=gfortran - GNU Fortran
COMPILER=intel - Intel ifort
COMPILER=pgi - Portland group compiler
COMPILER=g95 - G95 compiler
COMPILER=ibm - IBM AIX Fortran compiler for BlueGene
COMPILER=hector - Cray compiler as used on hector and archer
and with some minor changes on additional 6. No FTN95 listed here
May be FTN95 with time will merge with all other Fortran compilers using one common new parallelization method offered by Fortran 2018 Standard, I do not know. But that may take too long time. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8019 Location: Salford, UK
|
Posted: Sun Oct 06, 2024 8:21 am Post subject: |
|
|
Dan
If you are aready doing parallel processing but not with FTN95 perhaps you would send me the project so that I can get an idea of what you are asking for.
In the mean time FTN95 with ClearWin+ still provides viable and competitive solutions for all users (valetudinarian or not). Please see this video for further details...
https://www.youtube.com/watch?v=50BY9gyNY2o&t=16s |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2587 Location: Sydney
|
Posted: Mon Oct 07, 2024 12:47 pm Post subject: |
|
|
The recent Fortran versions of DO CONCURRENT now includes locality-specs for multi-threading.
You can now (Ver 2023) specify LOCAL, LOCAL_INIT, REDUCE, SHARED, and DEFAULT (NONE) in the same DO CONCURRENT statement.
( Not sure about selecting the number of threads. )
This provides most functionality of !$OMP PARALLEL DO and could be an interesting introduction for multi-threading.
Provision for SHARED and LOCAL/PRIVATE is an esential part of multi-threading. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2867 Location: South Pole, Antarctica
|
Posted: Mon Oct 07, 2024 11:18 pm Post subject: |
|
|
Paul,
This is the code I was writing about. Authors claim it is Fortran95. Somebody also mentioned 2003 but i am not sure was it for EPOCH itself or for some third party software for visualization of the code results
To install for example gFortran and MPI all (!!!) what needed is just type in terminal one command and after finishing another (remember how i spent 2 weeks installin on Windows Intel Fortran 22 with the "help" of entire dumb Intel Fortran community and failed?)
Code: | sudo apt-get install gfortran
sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev libgtk2.0-dev |
PIC code has nice just one page Introduction for getting started assuming you are using Linux.
https://epochpic.github.io/quickstart.html
Do you have Linux installed ? If not you can just download the code and try to compile and run it on Windows. I use Linux Mint with WINE to emulate Windows so my operating system is literally the OS of 22nd century, the best of all worlds, I run the code on Linux and see its results on Windows with FTN95. Linux Mint and Windows work simultaneously without any limits or restrictions, better than getting Windows on Linux via all that numerous Virtual Boxes or KVMs or anything else (because there Linux and Windows are separated). Highly recommend it. But I warn that the WINE installation is a bit tricky
I will just essentially repeat their Introduction the way how i preferred to do that. If you on Linux open Terminal
-- For Linux you can also download EPOCH from GitHub and unzip it to your place but I just use one simple git command and the code will be downloaded into my drive directory also preserving all the structure with numerous directories inside that folder you are in
Code: | git clone --recursive https://github.com/Warwick-Plasma/epoch.git |
if git is not installed Linux will ask you to install it, just answer Yes.
-- Then you go inside directory "epoch2d" or "epoch3d" depending on what you like more 2D or 3D modeling. You will do all further actions just in this directory. To compile you can use just one command
Code: | make COMPILER=gfortran -j4 |
but better to preliminary edit the first line of "Makefile" file there to add your preferred compilers, gFortran, Intel etc like this
Code: | COMPILER=gfortran
or
COMPILER=Intel |
And to compile the code just type in terminal
Good about "make" is that 1) it can do that in parallel on all your cores (that j4 above to run on 4 cores) and 2) next time you will compile after you edited something it automatically compiles only the file you changed. To start re-compiling again whole code from the beginning use command "make clean".
The "bin" folder will be created with the "epoch3d" executable file inside it (if you chose 3D). In the same directory as your Makefile just add for example any name folder, say, "Data" and put there EPOCH initial settings file called "input.deck" with the demo variants you can find in the directory "example_decks" in the same folder you are in. Take "cone.deck" example there as the simplest. Rename it to "input.deck" and put into "Data" folder. This example runs really fast even in 3D on usual PC with 8-16 cores. In 2D it runs almost in no time 100 times faster of course.
-- To run the code return to the directory "epoch3d" and just type
Code: | echo Data | mpirun -np 16 ./bin/epoch3d |
where 16 is desired number of cores. The results will be in the "Data" directory.
I wrote all in such great detail for anyone here who is not familiar with Linux but wants to try modern MPI multiprocessing and to demystify for themselves the word "supercomputers". The same Linux superc
Last edited by DanRRight on Thu Oct 10, 2024 8:57 am; edited 1 time in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2587 Location: Sydney
|
Posted: Tue Oct 08, 2024 4:33 am Post subject: |
|
|
Dan,
Coarrays and MPI (distributed processing) was introduced into Fortran 2008.
DO CONCURRENT (DC) was also introduced in Fortran 2008, but locality-specs wern't introduced until Fortran 2018 and REDUCE until 2023. Gfortran (versions I have tested) still do not support multi-threading of DO CONCURRENT, although it is now being introduced.
There is a considerable gap in functionality between Fortran 2023 and multi-threaded OpenMP ver 5.2.
It is intresting that Intel DO CONCURRENT is not strictly standard conforming, having come from previous auto parallel extensions.
A number of (hardware linked) Fortran compilers are now offering GPU off-loading for DO CONCURRENT. This shows a lot of potential.
However I have not seen significant benefits with real calculations, as DO CONCURRENT has limits to the types of calculations. Only pure functions can be called. Thread management is not easily available.
DO CONCURRENT offers a possible pathway for FTN95.
I know very little about COARRAY / MPI distributed processing, but these are not trivial skill sets.
The following link gives some interesting discussion of recent DC use
https://arxiv.org/abs/2408.07843#:~:text=There%20is%20a%20continuing%20interest,demonstrated%20on%20the%20NVIDIA%20platform. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8019 Location: Salford, UK
|
Posted: Tue Oct 08, 2024 8:11 am Post subject: |
|
|
Dan
I don't have Linux installed but I will take a look at EPOCH when I can.
John
I will aim to add LOCAL etc. so that FTN95 will not raise an error condition when these attributes are added to DO CONCURRENT. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|