JohnCampbell

Posts: 2526 Sydney

Back to Top

19 Oct 2022 4:15 #29467

Dan : Do your tests need new dll with latest fix of 4GB issue?

ans: The test I am doing is using version 8.91.1. This version has a fix for integer*8 address. I have not yet tested 4gb+ files.

The main purpose of my tests are to confirm that READ ( ..., pos=address,...) and WRITE ( ..., pos=address,...) are working for stream I/O.

What I am finding is 'WRITE ( lu, pos=address, ...) IO_list', appears to fail on rewriting to the file. I have found that 'READ ( lu, pos=address, ...) IO_list' appears to work in the test program I attached. This test also uses integer8, so FTN95 Ver 8.91.1 does address integer8 problems, but fails on re-writing data to the file. The 'pos=address' capability, provides for an addressible file data structure

Dan : And what are speeds of READ / WRITE ( pos=address ) in comparison with Method2 above ? I suspect it might not be worth to use it anymore

Ans : The Stream read/write ( pos=address ) should be supported by Windows buffered I/O, similar to Fortran unformatted, fixed length record performance. It should be a good solution. The stream I/O could be a much better solution than 'Method 2' which appears to be Fortran unformatted sequential I/O for records larger than 2gbytes. I still have not seen the header/footer syntax for this FTN95 solution, but I suspect it is not compatible with ifort/gfortran. I have read that ifort and gfortran are compatible for Fortran unformatted sequential I/O, including records larger than 2gbytes, althlough the header/footer syntax is a bit of a mess, using 2gbyte sub-records.

I would expect that stream I/O is a better solution, although it is probably safer to partition the data into records smaller than 2gbytes.

It is probably better to have a file dump something like:

     write(11) nB,size(Arr4(:,1))
    do i=1,nB
      write(11) i,Arr4(:,i)
    enddo
    write (11) -1

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

19 Oct 2022 8:29 #29470

Depending on array dimension sizes this way stream I/O will be up to 10x slower than by fast Method2. Don't remember if the files made by fast Method2 are compatible with other compilers and Linux. I currently temporally do not use Method2. Method1 is compatible. Using READF@ blows them all

JohnCampbell

Posts: 2526 Sydney

Back to Top

20 Oct 2022 3:14 #29472

Quoted from DanRRight Depending on array dimension sizes this way stream I/O will be up to 10x slower than by fast Method2

A bit of over-reach in the 10x !

I just think it is better to write 'records' that are smaller than 4 gbytes, although I do not recall what size nB and size(Arr4(:,1)) are. Spliting the binary dump into bits does give the option of inspecting the data read, rather than waiting for many gigabytes to be loaded.

It is hard to keep up with the latest available PCIe 4.0 M.2 SSDs and ver 5.0 to be available shortly. My understanding these 4.0 are over 5,000 MBps, which I think is 5 gigabytes per second ! At these rates, you have to carefully look at the resulting Fortran code being able to process this fast.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

20 Oct 2022 7:47 #29473

There is no good reason to chop the files by size <4GB and complicate the code
Put 6 into my test above instead of 11 and you will get speeds 0.28GB/sec --> 10x slower than Method2 and 20x slower than with READF@.

JohnCampbell

Posts: 2526 Sydney

Back to Top

21 Oct 2022 5:34 #29478

Dan,

I updated your program and tested with FTN95 and gfortran. Method 1 produces 0.4 GBytes/sec, while Method 2 is up to 2.9 GBytes/sec

This is on a Samsung 970 EVO Plus NVMe M.2 drive which I think is PCIe Gen 3.0

The program is ! compilation: ftn95 aaa.f90 /link /64 >z ! use iso_fortran_env

    integer*4, parameter :: million = 1000000
    real*4, dimension(:,:), allocatable :: Arr4
    integer*4 :: nA, nB, ierr, i, pass
    real*4    :: SpeedGBps, t0, t1, dt
    real*4, external :: delta_sec

    write (*,*) 'Compiler Version :',compiler_version ()
    write (*,*) 'Compiler Options :',compiler_options ()

    dt = delta_sec ()
    nA = 6 ! 11
    nB = 200 * million

!...Allocating array

    Print*, 'Trying to allocate GB of RAM :', 1.d-9 * 4. * nA * nB
    allocate ( Arr4 (nA, nB), stat = ierr)

    if (ierr.eq.0) then
       Print*, 'Allocation success'
    else
       Print*, 'Fail to allocate'
       goto 1000
    endif 

!...Filling the array with some data
    do i=1,nB
      Arr4(:,i) = [1,2,3,4,5,6,7,8,9,10,11]
    enddo
    dt = delta_sec ()
      SpeedGBps = 4. * nA * nB / 1024.**3 /(dt+1.e-10)
      print*,' Speed of initialising array =', SpeedGBps,' GB/sec',dt,' sec'   !   typically  ~0.5 GB/s

    do pass = 1,2     ! do 2 passes for file not/already exists
   
       Print*,'Trying to save the data Method 1 '
       call cpu_time(t0)
       dt = delta_sec ()
       open (11, file='LargeFile.dat', FORM='UNFORMATTED', access='STREAM', err=900)
       do i=1,nB
         write(11) Arr4(:,i)
       enddo
       close(11)   
       call cpu_time(t1) 
       dt = delta_sec ()
   
   !...Speeed of writing method 1
         SpeedGBps = 4. * nA * nB / 1024.**3 /(dt+1.e-10)
         print*,' Speed of write Method 1 =', SpeedGBps,' GB/sec',dt,' sec',(t1-t0)   !   typically  ~0.5 GB/s
   
       Print*,'Trying to save the data Method 2'
       call cpu_time(t0)
       dt = delta_sec ()
       open (11, file='LargeFile.dat', FORM='UNFORMATTED', access='STREAM', err=900)
       write(11) Arr4
       close(11)   
       call cpu_time(t1) 
       dt = delta_sec ()
   
   !...Speeed of writing Method 2
         SpeedGBps = 4. * nA * nB / 1024.**3 /(dt+1.e-10)
         print*,' Speed of write  Method 2=', SpeedGBps,' GB/sec',dt,' sec',(t1-t0)   !   typically  ~2.6 GB/s
   
    end do !  pass

    write (*,*) 'File LargeFile.dat test completed OK'
      goto 1000

!...............
!...Errors
900 Print*,'Can not open file LargeFile.dat'
    goto 1000
910 Print*,'Can not save file LargeFile.dat'


1000 Continue

    End
  
    real*4 function delta_sec ()
      integer*8 :: tick, rate, last=0
      call system_clock ( tick, rate)
      delta_sec = dble (tick-last) / dble (rate)
      last = tick
    end function delta_sec

JohnCampbell

Posts: 2526 Sydney

Back to Top

21 Oct 2022 5:38 #29479

The .bat file to test is set program=dan_stream

now  > %program%.log

del LargeFile.dat
del %program%.exe
del %program%.obj
del %program%.o

ftn95 %program%.f90 /64 /debug /link >>%program%.log
%program% >>%program%.log

del LargeFile.dat
del %program%.exe
del %program%.obj
del %program%.o

gfortran %program%.f90 -g -fimplicit-none -O2 -o %program%.exe >>%program%.log
%program% >>%program%.log

notepad %program%.log

The results are: It is now Friday, 21 October 2022 at 16:18:53.036 [FTN95/x64 Ver. 8.91.1.0 Copyright (c) Silverfrost Ltd 1993-2022] Licensed to: John Campbell Organisation: John Campbell

[Current options] 64;DEBUG;ERROR_NUMBERS;IMPLICIT_NONE;INTL;LINK;LOGL;

0089) 910 Print*,'Can not save file LargeFile.dat'
WARNING - 21: Label 910 is declared, but not used
    NO ERRORS, 1 WARNING  [<main program> FTN95 v8.91.1.0]
    NO ERRORS  [<DELTA_SEC> FTN95 v8.91.1.0]
[SLINK64 v3.04, Copyright (c) Silverfrost Ltd. 2015-2022]
Loading C:\temp\forum\stream_io\lgotemp@.obj
Creating executable file dan_stream.exe
 Compiler Version :FTN95 v8.91.1
 Compiler Options :64;DEBUG;ECHO_OPTIONS;ERROR_NUMBERS;IMPLICIT_NONE;INTL;LINK;LOGL;
 Trying to allocate GB of RAM :          4.80000000000    
 Allocation success
  Speed of initialising array =     1.85777     GB/sec     2.40630     sec
 Trying to save the data Method 1 
  Speed of write Method 1 =    0.387883     GB/sec     11.5250     sec     11.4062    
 Trying to save the data Method 2
  Speed of write  Method 2=     2.69314     GB/sec     1.65990     sec     1.56250    
 Trying to save the data Method 1 
  Speed of write Method 1 =    0.386933     GB/sec     11.5533     sec     11.5312    
 Trying to save the data Method 2
  Speed of write  Method 2=     2.87482     GB/sec     1.55500     sec     1.56250    
 File LargeFile.dat test completed OK
 Compiler Version :GCC version 11.1.0
 Compiler Options :-mtune=generic -march=x86-64 -g -O2 -fimplicit-none
 Trying to allocate GB of RAM :   4.8000000000000007     
 Allocation success
  Speed of initialising array =   4.88828278      GB/sec  0.914502800      sec
 Trying to save the data Method 1 
  Speed of write Method 1 =  0.441231251      GB/sec   10.1315317      sec   10.0937500    
 Trying to save the data Method 2
  Speed of write  Method 2=   1.66670907      GB/sec   2.68214083      sec   2.68750000    
 Trying to save the data Method 1 
  Speed of write Method 1 =  0.448078960      GB/sec   9.97669792      sec   9.96875000    
 Trying to save the data Method 2
  Speed of write  Method 2=   1.75272882      GB/sec   2.55050778      sec   2.53125000    
 File LargeFile.dat test completed OK

I prefer elapsed time testing, but CPU_time is similar in this case. Paul's implementation of 4GB+ stream I/O is very efficient !!

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

21 Oct 2022 9:05 #29481

John, You confirmed my numbers. How about your method? Is it indeed slow?

JohnCampbell

Posts: 2526 Sydney

Back to Top

22 Oct 2022 6:06 #29483

Dan,

You are comparing the overhead of writing 200 million 24 byte records vs 1 gigantic 4.8 gigabyte record. I have tested a middle ground of writing 240 byte or 2,400 byte records where the performance is not so different.

My preference would be to target 'records' of about 1 kb to 100 kbytes, although smaller records can be easier to manage in post processing.

It is good we have seen the extremes of method 1 vs method 2.

Stream I/O is certainly more portable, as there is no clash with different header/footer formats. It is also easy with stream I/O to replicate or read ifort or gfortran sequential binary file formats.

Another option with STREAM I/O is to read a sequential binary file format and construct a table of record addresses. This can be later used to randomly access the records using 'read ( lu, pos=rec_address(rec_id) ) IO_list', which opens up a much more flexible way of accessing the data.

Stream I/O is a very useful addition to Fortran

John

JohnCampbell

Posts: 2526 Sydney

Back to Top

23 Oct 2022 2:43 #29490

Dan,

I have increased the test options for record size (ie number of records) vs I/O performance. ( Do pass = 1,4 )

However, depending on the amount of memory installed, the test appears to be more testing Microsoft disk buffering, without actually testing the disk performance. This is because the test program generates the file in memory and does not enforce the data is transferred to the disk.

I have a HP i7-6800 notebook with SADA SSD, but only 8gbytes of memory. It's performance is nothing like the NVMe M.2 drives we have been reporting, mainly because there is not sufficient memory for both storing the 6gb array and buffering the 6gb file. I think the estimated disk I/O performance of 2 to 4 gbytes/sec being reported are due mainly to memory buffering and not the SSD drives (which also have memory buffers)

I thought the background to your tests were for reading large data sets from disk into memory. Again these tests can not be repeated as once the data is read from disk to the memory buffers, the reported transfer rates would not represent disk I/O performance.

I think if you want to understand the I/O performance, you should test reading a terrabyte data file (ie much larger than installed memory).

Also, while you are claiming 2 to 6 gigabytes/sec disk transfer rates (really memory buffer transfer rates), the performance for processing the real data can be much less, probably less than 0.5 GBytes/sec.

In real data tests, the NVMe M.2 drive performance rates for I/O will decline significantly once the PC memory buffering and SSD drive memory buffering capacities are exhausted.

( Hopefully, my next PC will have 128 gbytes of DDR5 memory !! )

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

23 Oct 2022 4:47 #29491

I will assure you 128GB like 640k in the past is not 'enough to fit everyone'. :lol:

JohnCampbell

Posts: 2526 Sydney

Back to Top

25 Oct 2022 6:13 #29492

Paul,

I have downloaded FTN95 Ver 8.92 and tested the latest stream_readc.f95

This version now appears to work correctly for write ( lu, pos=address) ... when overwriting the file.

All 3 files that are generated from the 3 tests now appear to be the same.

Thanks very much for this update.

I will now try to perform a test for overwrite for a file larger than 4 gbytes ( using integer*8 :: address )

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

27 Oct 2022 4:41 #29503

Write was fixed. Now same problem with READ. Does not work if array > 4 GB...

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

27 Oct 2022 4:45 #29504

! compilation: ftn95 aaa.f90 /link /64 >z
!
    real*4, dimension(:,:), allocatable :: Arr4

    nA = 11
    nB = 1.e8

!...Allocating array
    Print*, 'Trying to allocate GB of RAM :', 1.d-9 * 4. * nA * nB
    allocate ( Arr4 (nA, nB), stat = ierr)

    if (ierr.eq.0) then
       Print*,'Allocation success'
    else
       Pause 'Fail to allocate'
       goto 1000
    endif 

!...Filling the array with some data
    do i=1,nB
      Arr4(:,i) = [1,2,3,4,5,6,7,8,9,10,11]
    enddo

    Print*,'Trying to save the data Method 1 '
    call cpu_time(t0)
    open (11, file='LargeFile.dat', FORM='UNFORMATTED', access='STREAM', err=900)
    do i=1,nB
      write(11,err=910) Arr4(:,i)
    enddo
    close(11)   
    call cpu_time(t1) 

!...Speeed of writing method 1
      SpeedGBps = 1.d-9 * 4. * nA * nB / (t1-t0+1.e-10)
      print*,'Write OK. Speed of write Method 1 =', SpeedGBps   !   typically  ~0.5 GB/s

            print*,'================ N O W    R E A D ===================='
    call cpu_time(t0)
    open (11, file='LargeFile.dat', FORM='UNFORMATTED', access='STREAM', err=900)
    do i=1,nB
      read(11,err=912,end=914) Arr4(:,i)
    enddo
    close(11)   
    call cpu_time(t1) 

!...Speeed of writing method 1
      SpeedGBps = 1.d-9 * 4. * nA * nB / (t1-t0+1.e-10)
      print*,'READ OK. Speed of read Method 1 =', SpeedGBps   

          Print*,'Trying to save the data Method 2'
    call cpu_time(t0)
    open (11, file='LargeFile.dat', FORM='UNFORMATTED', access='STREAM', err=900)
      write(11,err=920) Arr4
    close(11)   
    call cpu_time(t1) 

!...Speeed of writing Method 2
      SpeedGBps = 1.d-9 * 4. * nA * nB / (t1-t0+1.e-10)
      print*,'Write OK.  Speed of write  Method 2=', SpeedGBps   !   typically  ~2.6 GB/s
            print*,'================ N O W    R E A D =================='
    call cpu_time(t0)
    open (11, file='LargeFile.dat', FORM='UNFORMATTED', access='STREAM', err=900)
      read(11,err=930,end=932) Arr4
    close(11)   
    call cpu_time(t1) 

!...Speeed of writing method 2
      SpeedGBps = 1.d-9 * 4. * nA * nB / (t1-t0+1.e-10)
      print*,'READ OK. Speed of read   Method 2 =', SpeedGBps   
      
      pause 'File LargeFile.dat created OK'
      goto 1000

!...............
!...Errors
900 Print*,'Can not open file LargeFile.dat'
    goto 1000

910 Print*,'Error. Can not save file LargeFile.dat Method 1'
    goto 1000
912 Print*,'Error. Can not load file LargeFile.dat Method 1'
    goto 1000
914 Print*,'Abruptly tnd of file  LargeFile.dat Method 1'
    goto 1000


920 Print*,'Error. Can not write file LargeFile.dat Method 2'
    goto 1000
930 Print*,'Error. Can not read file LargeFile.dat Method 2'
    goto 1000
932 Print*,'Abruptly end of file  LargeFile.dat Method 2'
    pause

1000 Continue
    End

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

27 Oct 2022 6:08 #29505

I have made a note of this issue.

JohnCampbell

Posts: 2526 Sydney

Back to Top

27 Oct 2022 12:28 #29506

Paul,

I am trying to go to the next step with using stream I/O to generate a random access binary file library for variable length records.

I am finding a problem with the returned file position after a random access read, ie read ( unit=lu, pos=address, iostat=iostat ) (rec_data(i),i=1,n) inquire ( unit=lu, pos=new_address, iostat=iostat )

Inquire appears to be returning the new_address after the last write, rather than the address after this last read. For this read test, the returned address is always the end of the file (after have previously written all records sequentially)

I could send the test program, if that helps, although it is very much an alpha version.

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

27 Oct 2022 10:30 #29507

John, Are these tricks you are developing are just to make an interface between gFortran you now use for simulations and Clearwin to plot the results and control the runs?

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2022 12:18 #29509

Dan,

At present I use Fortran fixed length record random access binary file format for my binary file data. Fortunately, it has been compiler independent, as I have used many Fortran compilers.

(I wrote this library in 1970's to emulate the CDC random access file structure, using 'standard' fortran. I say 'F77 standard' as the approach is to provide the 'record' address in memory and the length in 4-byte words. I can get a lot of mixed type errors! )

This file is based 4-byte addressing and is limited to 8 GByte file size. This includes a record buffering approach as the 'program' variable size records are transferred to the 64kbyte fixed length file record.

I have always wanted to update the approach by:

extend the file size limit by converting to an 8-byte addressing and,
remove the duplicate buffering, as Windows also provides very effective file buffering using free memory.
testing different logical record header/footer formats to overcome the long standing portability problems that FTN95 (1 or 5 bytes), gfortran and ifort have ( the mess with records larger than 2 gbytes )
expanding the header/footer format to include data type and kind ?

The other feature I am also looking to address is to include the flexibility of a Fortran I/O list for the record, rather than a record memory address. This could possibly include a derived type record. However, I am very use to the F77 approach of building a record using EQUIVALENCE or TRANSFER, so a modern Fortran approach to this can wait.

I am not sure about the need for header/footer labeling for records, as they were initially provided for BACKSPACE. Does anyone still use this ?

All this has become a bit academic with 64-bit, as I have transferred most of my binary file data to an in-memory derived type data structure using allocatable components. This data can now be transferred between file at the start/end of a run, as a sequential dump, simplifying the indexing structure.

Defining my own binary file data structure using stream I/O removes many of the portability, performance and capacity issues.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

28 Oct 2022 12:54 #29510

John, But i am curious what for you need all that? It does not resonate with any my needs no matter how hard i try to imagine any future trends and progress of computing 😃

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2022 1:42 #29511

Dan,

Very simply: My MSLIB ( random access file library ) from 1970's uses 4-byte addressing, which is limited to 8 GByte file sizes. So I need to a new library with 8-byte addressing, for larger files (and 8-byte address tables in the program) I must use new routine names so I don't miss updating some old routine usage.

Why not change to Stream I/O at the same time, to see if it offers any other functional advantages ?

Hence my interest in confirming stream I/O works in FTN95 for large files and confirm in a way I expect.

( Happy to learn if it is different from what I expect as I don't always RTFS. Has anyone actually tried to read the latest Fortran standard? Approaching 1,000 pages !! )

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

28 Oct 2022 5:41 #29512

I opened it for few seconds recently. Decently, i did not RTFS even one for Fortran95. Like probably we all here.

Please read it, you are our the only hope. You were the only who correctly felt the missing features years ago and were bringing this info here for our attention. For example 5 years ago on Sep 12 you wrote

**Using gFortran provides a number of advantages, which FTN95 is not expected to address in the near future.

Inclusion of F03 and F08 standard features.

!$OMP support with -fopenmp

Extensive vector instruction support, eg -O3 -mavx -march=native -ffast-math**

At that time i was deaf and was not listening but you were right at saying that. Today i suffer from missing features hard. Tomorrow others will.

Fails to save arrays > 4GB

Inclusion of F03 and F08 standard features.

!$OMP support with -fopenmp

Extensive vector instruction support, eg -O3 -mavx -march=native -ffast-math**