forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Can Fortran merge files?

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Sun Apr 23, 2023 9:47 am    Post subject: Can Fortran merge files? Reply with quote

When some parallel code saves the data typically each core generates a file and all of them are just the small portion of one large array. When there are 10-30 files per output all still looks bearable but when there are 100 or 1000 repeated that 50 times you quickly get annoyed with the mess. Do Fortran have some easy possibility to merge the files? Obvious way do workaround of course would be reading small files one by one and dumping data into one file which i suspect could be not that fast. May be there exist some command which simply connects the file to the end of another without reading and writing ?

I will note that access=APPEND during program output will not work with parallel programs, because each core is doing output at the same time with all others. And i do not know how to align cores in one queue and make them do that one by one serial way, finish the output and do continue simulation in parallel again, by unknown reasons this using BARRIERS is not easy to realize (in short, it simply does not work no matter what you try)
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Sun Apr 23, 2023 10:04 pm    Post subject: Reply with quote

Dan, you say "merge", but your description pertains only to the the special case "concatenate". Presumably, the thousands of files are named in such a way that the order of each piece is completely known from some substring of its name, without having to read the file. If so, you can use a shell command that concatenates files, such as cat.

Another way would be to open a single output file for direct access output with a sufficiently large block size. Each processor then writes to its designated blocks of the single output file. The file is kept open until all the parallel processors have finished, and then the main thread/process closes the file.

Third option. Do you need a single output file at all?
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Mon Apr 24, 2023 6:41 am    Post subject: Reply with quote

Dan,

In OpenMP, I write single line records from all threads in one .csv format file. The write is inside a OMP CRITICAL section.

Each single line record starts with the thread id and other identifying fields.
I can then open the file in Excel and sort by thread or other field to better understand and analyse the output. Filter also helps in this regard.

The single line record is easier, but as long as you provide key identifying fields and then record type identifiers for further post-processing this works well.

I also use this file for multiple runs, which helps for post processing.

Even if you have multiple files, you must have record identifying fields, so you can merge and sort the results in a .csv, not .txt file.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Mon Apr 24, 2023 9:52 pm    Post subject: Reply with quote

Thanks Mecej4 and John,
I thought i missed some tricky I/O Fortran function which can concatenate (you are right, Mecej4, this is what i need, and yes, this is worth doing otherwise you get 50k files per typical run on 1000 cores. The load speed of one larger file versus many smaller is not influenced much because this speed is already close to the limit of the hardware).

The cat command would be probably easiest way to achieve that by calling command prompt from the Fortran source. Will be hilarious if FTN95 will do this operating system command despite this will be under Linux (Linux hacks made that you almost do not feel differences when you use Windows software, it works in both systems as if all is native, you click on Windows file and it magically works. Linux Mint specifically, by the way, Ubuntu as is out of the box is a cr#p, needs so many adjustments, that you will break and reinstall it several times before it will be usable with FTN95).

As to direct access files - that would be a good idea to try if it will work with many threads simultaneously, it may not like that. If you for example try to use one large global array in RAM memory and write there in parallel even into completely separate address spaces - this will not work, or at least i did not succeed no matter what i tried

John, i mark each file with distinctive names with keywords and numbering. Then the code when reading them understands itself what is what. I rarely can get now any information from one line for the name of thread or from the massive data of each thread's file and need to load it all and see all graphically.

Life becomes more crazy with each year, now terabytes become common, next are petabytes... Have anyone here thought to adopt AI like ChatGPT to make it to understand your need and quickly do some routinely repeating tasks? You just tell: "Merge these files and prepare the figures like last time" and all is done same second. Or "Create me prototype of Property Sheet with 13 empty tabs with the same design like in the ABC code" and all is done instantly instead of 3 hours later like you would do manually Smile
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Apr 26, 2023 3:54 am    Post subject: Reply with quote

Dan,

If you are generating lots of files and you have read access to the network file system, you could use FILES@ to scan for all the available files and merge the contents in an appropriate way. Perhaps by specifying a list of disk/directories to scan.

You could then write your rules to read and transfer those records you wish to use.

I havn't yet got to use ChatGPT, but the scanning rules should be able to be written in Fortran.

I have for many years used FILES8@ to scan disks to get lists of possible files, especially based on creation date, file extension or size to filter file names.

John
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group