forums.silverfrost.com

DanRRight · Posted: Mon May 20, 2024 9:16 am Post subject:

John,
Will look at the idea to allocate separate subtypes but not sure right now how to start.
So far i succeeded to make the last my example above (with multiple types) to work only when allocatable was just one variable out of 4, others were set in the source code as a constant values (integer parameters).

/* That allocatable variable was number of fish in the depth levels in the aquarium in the narrative i used above. This already saved me hundreds of GB. If i were able to add at least one more alocatable variable, say, the number of depth levels in the aquarium, all would be almost perfect.

Another method, very different from current one and requiring even more substantial rebuild, would be to use linked lists with pointers. Not sure how fast and optimized it is because definitely little who used that. Not sure how fast will be loading, saving and copying such data structures. Even with current method with types i got substantial speed reduction when just copied one array to another. If with usual arrays i was getting loading speeds 2.6 GB/second, just adding copying of that array to TYPEd array and one more simple operation reduced the total speed to 0.55GB/s. Standard hardware with PCIe 5.0 and latest MVNe allows potentially for 15 GB/s and is freely available even for pinguins in Antarctica

But decently, looks like single core programming is dead. All need to forget single cores and move to parallel programming. This compiler with debugger like SDBG and Plato allowing to debug parallel code would be killing software.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2615 Location: Sydney

Dan,

For derived types, I have a derived type of multiple allocatable arrays, and store their dimensions in the derived type.
I then have an allocatable array of this derived type.
In this way I don't include derived types in another derived type.
I found this a very useful approach to shift my disk based database into memory.
Derived types appear to be very robust, although I rarely re-allocate these data structures.

However, it appears that your multi-level derived type example appears to work ok in the revised code. It may work very well.
Is it legal Fortran, as I don't know ?

As for your multi-thread comments, this is not plane sailing.
I have been using OpenMP with some very good results for suitable calculations. I am achieving up to 100 GFLOPS performance, which impresses me, but that is with a well suited problem.

The problem I have is the overhead of starting an OpenMP mult-thread region is about 10 to 20 micro seconds, which is about 50,000 processor cycles. You can do a lot of calculation in 50,000 cycles, so small computation tasks can't use multi-threads in current OpenMP.

The other problem I have is when distributing threads across large sets of data is that the memory to cache transfer bandwidth demand is proportional to the thread count, but for cheap hardware, the memory transfer bandwidth does not scale up with core count. If you can keep your data in cache you get great results, but with gigabytes of data, threads often stall.

A good example of this is DOT_PRODUCT rarely suits multi-thread, although it is often used as a multi-thread coding example! Too small a vector, the start overhead defeats you, too large and the data is not in cache quick enough.

Who knows where these problems may be addressed, but probably not on my budget.