Topic: Bug in SCC 3.88 in General

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

22 Dec 2016 5:33 #18590

Mecej4, Did you change writef to readf or readfa when reading file? With first one my code crashes with second gives 3x less performance then writef. Post your code, i couldn't find what is wrong.

I still hope to find much simpler mechanism of direct load of binary data into RAM bypassing all kind of deciphering. 300MB/s is not 2,3 or 5 but 40x smaller then the peak I/O speed. Are we living in the times of Big Data or not?

The only difference between what we have with readf@/readfa@ (which hopefully can reach the same GB/s as writef as you are claiming in your case) is that they read data into 1D array Arr(ix) while i need to read it into 2D or 3D ones like Arr(ix,iy,iz). There could exist some tricks and workarounds to solve that problem, couple of them i'd like to test (with EQUIVALANCE if it is not yet totally obsolete or cutting structured 1D array into pieces)

/* What was average taxation level of current US territory when it was still under UK versus average current US taxation rate?

mecej4

Posts: 1911

Back to Top

22 Dec 2016 12:23 (Edited: 22 Dec 2016 12:27) #18591

Quoted from DanRRight Did you change writef to readf or readfa when reading file?

Readf@. Readfa@ is for text files, and only reads one line with each call.

The only difference between what we have with readf@/readfa@ is that they read data into 1D array Arr(ix) while i need to read it into 2D or 3D ones like Arr(ix,iy,iz).

At the machine code level (or even assembler level) there is no such thing as a 2D or 3D array. File I/O, whether binary or formatted, moves bytes between the file and a memory buffer designated by its base address. Fortran uses the column-major convention for multiple dimension arrays, so given the declaration DIMENSION A(imax,jmax), the statement READ (iu) A is the same as READ (iu) ((A(1:imax, j), j = 1, jmax). If you do I/O with only a section of A, the compiler will need to emit extra code to break up the incoming data into chunks and put them into discontiguous parts of memory (for READ) or assemble the data from different blocks of memory and send to the file (for WRITE). The fastest I/O is possible if doing unformatted/binary transfers to/from whole arrays.

Here are the test source codes. First, the binary I/O code:

program FIOBIN
implicit none
!
! Test raw file I/O speeds to a 64 MiB file. Check for space before running!
!
integer, parameter :: I2 = selected_int_kind(4), I4 = selected_int_kind(9), &
                      I8 = selected_int_kind(18)
integer, parameter :: BSIZ = Z'4000000'   ! 64 MiB
character (Len=1) :: buf(BSIZ)
integer (I2) :: hndl, ecode
integer (I8) :: nbytes = BSIZ, nread
real :: t1,t2
character(len=7) :: fil='BIG.BIN'
!
call openw@(fil,hndl,ecode)
if(ecode /= 0)stop 'Error opening file BIG.BIN for writing'
call cpu_time(t1)
call writef@(buf,hndl,nbytes,ecode)
call cpu_time(t2)
if(ecode /= 0)stop 'Error writing file BIG.BIN'
call closef@(hndl,ecode)
if(ecode /= 0)stop 'Error closing file'
write(*,'(A,2x,F7.3,A)')'Time for writing 64 MB file: ',t2-t1,' s'
write(*,'(A,6x,F6.0,A)')'Estimated write throughput = ',64.0/(t2-t1),' MiB/s'
!
call openr@(fil,hndl,ecode)
if(ecode /= 0)stop 'Error opening file BIG.BIN for writing'
call cpu_time(t1)
call readf@(buf,hndl,nbytes,nread,ecode)
call cpu_time(t2)
if(ecode /= 0)stop 'Error reading file BIG.BIN'
call closef@(hndl,ecode)
if(ecode /= 0)stop 'Error closing file'
write(*,'(A,2x,F7.3,A)')'Time for reading 64 MB file: ',t2-t1,' s'
write(*,'(A,6x,F6.0,A)')'Estimated read throughput  = ',64.0/(t2-t1),' MiB/s'
!
call erase@(fil,ecode)
call doserr@(ecode)
end program

On my laptop, the output:

s:\FTN95>fiobin
Time for writing 64 MB file:     0.063 s
Estimated write throughput =        1024. MiB/s
Time for reading 64 MB file:     0.031 s
Estimated read throughput  =        2048. MiB/s

Next, the ASCII I/O code: [HAVE TO BREAK UP THE POST -- FORUM line limit reached]

mecej4

Posts: 1911

Back to Top

22 Dec 2016 12:26 (Edited: 22 Dec 2016 1:08) #18592

[CONTINUED] Next, the ASCII I/O test code:

program FIOASC
implicit none
!
! Test text file I/O speeds to a 64 MiB file. Check for space before running!
!
integer, parameter :: I2 = selected_int_kind(4), I4 = selected_int_kind(9), &
                      I8 = selected_int_kind(18)
integer, parameter :: FSIZ = Z'100000'   ! File size, 2^20 lines
character (Len=64) :: lbuf               ! Line buffer
integer (I2) :: hndl, ecode
integer (I8) :: nlines = FSIZ
real :: t1,t2
character(len=7) :: fil='BIG.TXT'
integer :: i,j,nread
character(len=1) :: c
!
lbuf = 'If at first you do not succeed, try, try, try again, she said.'
call openw@(fil,hndl,ecode)
if(ecode /= 0)stop 'Error opening file BIG.TXT for writing'
call cpu_time(t1)
do i=1,nlines
   call writefa@(lbuf,hndl,ecode)
   if(ecode /= 0)stop 'Error writing file BIG.TXT'
   j=mod(i,64)+1         ! swap characters to provide variation in lines written
   c=lbuf(j:j)
   lbuf(j:j)=lbuf(1:1)
   lbuf(1:1)=c
end do   
call cpu_time(t2)
call closef@(hndl,ecode)
if(ecode /= 0)stop 'Error closing file'
write(*,'(A,2x,F7.3,A)')'Time for writing 64 MB file: ',t2-t1,' s'
write(*,'(A,6x,F6.0,A)')'Estimated write throughput = ',64.0/(t2-t1),' MiB/s'
!
call openr@(fil,hndl,ecode)
if(ecode /= 0)stop 'Error opening file BIG.TXT for writing'
call cpu_time(t1)
do i=1,nlines
	call readfa@(lbuf,hndl,nread,ecode)
   if(ecode /= 0)stop 'Error reading file BIG.TXT'
end do   
call cpu_time(t2)
call closef@(hndl,ecode)
if(ecode /= 0)stop 'Error closing file'
write(*,'(A,2x,F7.3,A)')'Time for reading 64 MB file: ',t2-t1,' s'
write(*,'(A,6x,F6.0,A)')'Estimated read throughput  = ',64.0/(t2-t1),' MiB/s'
!
call erase@(fil,ecode)
call doserr@(ecode)
end program

My output from this:

s:\FTN95>fioasc
Time for writing 64 MB file:     0.141 s
Estimated write throughput =         455. MiB/s
Time for reading 64 MB file:     0.266 s
Estimated read throughput  =         241. MiB/s

Expect even slower formatted I/O if format conversions of floating point REALs or DOUBLE PRECISION values are to be done. We have seen some examples of this in a recent thread, https://forums.silverfrost.com/Forum/Topic/2970 .

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

22 Dec 2016 12:59 #18593

Thanks, found my error due to missing parameter... damn, seems besides possible Alzheimer i am getting also ADT 😉

By the way I got the following result for the second code ( increased file size to ~1GB, the 64MB is too small to measure time correctly)

Time for writing file of size MB:       819  1.141 s
Estimated throughput =         718. MB/s
Time for reading file of size MB:       819  1.953 s
Estimated throughput =         419. MB/s

But more interesting was the first test

Time for writing file of size MB:      1024  0.516 s
Estimated throughput =        1986. MB/s
Time for reading file of size MB:      1024  0.234 s
Estimated throughput =        4369. MB/s

It shows reading speed 4.4GB/s. With this speed is already interesting to work. Now after reading the data if fill the columns of another the same size but 3D array with such 1D data preliminary pre-formed and saved column-by-column, thinking but not yet know how, we may get tremendous reading speed...In this case the loaded array buf(BSIZ) of multi-GB size will be lost but who cares, memory becomes cheaper and cheaper and 64bit compiler is hopefully close to be complete.

What we need is the way of transferring real4 and integer4 numbers into bytes, save them in character array, save this array on disk, then read this binary array exactly as these 4 bytes are representing real4 and integer4 numbers inside the computer. This way we will avoid burden of processing.

mecej4

Posts: 1911

Back to Top

22 Dec 2016 4:19 #18594

Quoted from DanRRight ... 64bit compiler is hopefully close to be complete.

If you try the FIOASC program with /64, you will see that there is a performance problem. The writing phase is about ten times slower than with FTN95-32 bit, although the reading speed is about the same.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

22 Dec 2016 10:59 #18595

Then we don't have to use it, it is slow anyway even if in future excessive slowness will be fixed. Binary readf@ is ok. Or I miss something?

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

23 Dec 2016 3:46 #18596

Guessing why Salford made byte readf@ and character readfa@ but did not make real4 and real8 utilities? How best way to convert real4 number into 4 character1 numbers and vice versa? Ideally portable way across all platforms and languages like with hdf5?

mecej4

Posts: 1911

Back to Top

23 Dec 2016 5:40 #18597

That cannot be done.

Text files use LF (or CR+LF) to separate lines. These characters are not used for any other purpose or with any other meaning in a text file. The READFA@ subroutine reads one line for each call to it. The buffer that you provide is filled with all characters in the line up to, but not including, the LF.

Real numbers in their internal binary format cannot be placed in text files. Why? Consider, for example, the REAL*4 number 552.0. It has an IEEE representation of Z'440A0000'. Look at the second most significant byte, 0A. How do you tell that that is part of a number and not a record separator? How to tell that the next byte, Z'44', is not the letter 'D'?

Another reason is that such files cannot be printed or viewed by most people, who are not proficient at mental calculations using hexadecimal numbers.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

23 Dec 2016 11:09 (Edited: 23 Dec 2016 11:40) #18598

I do not think you are right, mecej4, data is perfectly converted. Problem is solved unless you will find any errors. Here is bottom line: we have some big data in real4 array A(iz,iy,iz), saving it on ramdisk (with 2GB/s) and then reading and recovering it into real4 array C(ix,iy, iz) with unseen

                         *** 4+ GB/s reading speed *** not losing anything on format conversion. Binary I/O is used as a carrier. Same can be done for any other big data


Time for reading file of size MB:       280  0.063 s
Estimated throughput =        4480. MB/s




integer, parameter :: ix = 10, iy=700000, iz=10  
integer, parameter :: BSIZ = 4 * ix * iy * iz  
character (Len=1) :: buf(BSIZ), bufRead(BSIZ) 
integer*2 :: hndl, ecode 
integer*8 :: nbytes = BSIZ, nread 

real*4 A(ix,iy,iz)
equivalence (A,Buf)
real*4 C(ix,iy,iz)
equivalence (C,bufRead)

do iiz=1,iz
 do iiy=1,iy
  do iix=1,ix
   A(iix,iiy,iiz) = iix
  enddo
 enddo
enddo

print*, 'A=', A(1,1,1), A(2,1,1)

call openw@('Y.bin',hndl,ecode) 
if(ecode /= 0)stop 'Error opening file Y.BIN for writing' 
call writef@(buf,hndl,nbytes,ecode) 
if(ecode /= 0)stop 'Error writing file Y.BIN' 
call closef@(hndl,ecode) 

! .............................

call openr@('Y.bin',hndl,ecode) 
if(ecode /= 0)stop 'Error opening file BIG.BIN for writing' 
call cpu_time(t1) 
call readf@(bufRead,hndl,nbytes,nread,ecode) 
call cpu_time(t2) 
if(ecode /= 0)stop 'Error reading file BIG.BIN' 
call closef@(hndl,ecode) 

write(*,'(A,2x,i7, F7.3,A)')'Time for reading file of size MB: ',BSIZ/1000000, t2-t1,' s' 
write(*,'(A,6x,F6.0,A)')'Estimated throughput = ',BSIZ/1000000/max(1.e-6,(t2-t1)),' MB/s' 

print*, 'Checking if C=A', C(1,1,1), C(2,1,1)

end

mecej4

Posts: 1911

Back to Top

23 Dec 2016 11:37 (Edited: 23 Dec 2016 12:46) #18599

Dan, problem not solved. Problem under rug. Try reading the data file in a text editor.

You are still calling READF@ and WRITEF@. These subroutines simply read and write bytes with no awareness of what those bytes represent. The code that you just posted is the same as my binary I/O example with a few lines added to initialize the array A. You cannot view the file or print it and make sense of its contents. No conversions involved.

What I said you cannot do is to perform I/O of REAL variables to text files without format conversion, by calling WRITEFA@ and READFA@. You can certainly convert your terabyte-sized data files to binary files and then process the binary files. The conversion is an unavoidable and time-consuming process. The less often you need to do the conversion, the better.

If your data is coming from someone else, you can work with them to define a custom file exchange format or use HDF/NETCDF. If you receive text files from them, you cannot avoid slow format conversions.

If you end up using binary files, you had better add some safety features to protect and verify the integrity of the 'non-human-readable' data in them. For example, you can add check-sums after every MiB of data, a separate companion check-sum file, etc.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

23 Dec 2016 11:50 #18600

Mecej4, Oh, c'mon, that is different story. Binary vs text output question is closed in favor of binary in my case because the text IS UNACCEPTABLY SLOW. By 10x as we can see. Even binary 4GB/s is far from peak 12 GB/s i'd like to see. Lose 3 seconds per day and you will lose 24h day at the end of your life. With our TBs of data we lose hours per day, or many years per life.

Everyone else decide by themselves to keep output text or binary. If speed and size is not an issue text is of course preferable.

Now question to Silverfrost: can we get 12GB/s and be by the light years the leader of Fortran market like in numerous important aspects it always was before?

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

23 Dec 2016 12:51 #18601

Now question to Silverfrost:

A very short question after a long discussion that I have not followed.

I am aware of some inefficient code in the FTN95 library for 64 bit formatted PRINT/WRITE of REAL values. We aim to fix this in due course but it is not an immediate priority.

mecej4

Posts: 1911

Back to Top

23 Dec 2016 12:54 #18602

Quoted from DanRRight Binary vs text output question is closed in favor of binary in my case because the text IS UNACCEPTABLY SLOW.

is incompatible with

Guessing why Salford made byte readf@ and character readfa@ but did not make real4 and real8 utilities? How best way to convert real4 number into 4 character1 numbers and vice versa?

READF@ is type-agnostic. A byte is mostly typeless. Byte variables, if a language allows them, are usually catch-all-type variables which let you hold data until you cast or convert to/from a desired type, such as REAL, INTEGER, etc. Thus, READF@ is already capable of reading data that happen to be IEEE bit representations of REAL. So, you do not need 'real4 and real8' utilities.

mecej4

Posts: 1911

Back to Top

23 Dec 2016 1:08 #18603

Quoted from PaulLaidler

A very short question after a long discussion that I have not followed.

I am aware of some inefficient code in the FTN95 library for 64 bit formatted PRINT/WRITE of REAL values. We aim to fix this in due course but it is not an immediate priority.

Paul, in this thread I posted a test program (FIOASC) on 22 Dec. 2016 which simply writes a CHARACTER(len=64) to a file repeatedly (64 MiB total) using WRITEFA@. No format conversion is involved, it suffices to keep track of line feeds. The write phase of this program takes 10X longer with /64.

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

23 Dec 2016 1:43 #18604

mecej4

Thank you. I have looked at WRITEFA@ and made changes that will hopefully fix the problem.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

23 Dec 2016 2:06 #18605

Paul,

My request was to find out what keeps readf@ limit on 4 GB/s instead of 12 - is it parallelization or something else. When you will test unformatted I/O which is fast and should be even faster please use larger data 300MB minimum as the timer is not as high resolution. To save your time here is BAT file for the last code with the stack adjustments for 32 bit version

ftn95 equiv.f95  /set_error_level error 298 /no_truncate /nocheck /silent /opt >equiv_FTN895
slink equiv.obj  /stack:1200000000   >equivLink
equiv.exe >zz

Make sure you are running the code on RAMdisk. Test RamDisk speed itself, it has to be in 10+ GB/s territory.

Quoted from mecej4

'...aaa is incompatible with bbb... Thus, READF@ is already capable of reading data that happen to be IEEE bit representations of REAL'

Who knew that for sure before the test clearly showed that ?

mecej4

Posts: 1911

Back to Top

23 Dec 2016 3:34 #18607

Quoted from DanRRight

Quoted from mecej4 Thus, READF@ is already capable of reading data that happen to be IEEE bit representations of REAL

Who knew that for sure before the test clearly showed that ?

Anyone who has programmed in C or has used C programs. open(), close(), read() and **write() **are all standard system calls in Unix, and have been with us since the 1970s. The FTN95 nonstandard subroutines provide the Fortran programmer access to the same functionality. Perhaps, they were added when FTN77 or FTN95 was ported to Linux in the 1990s.

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

23 Dec 2016 3:41 #18609

Dan

If you can simplify this issue into a small program with simple test data then I will log it for investigation.

With so many issues going on at one time it is not easy for us to get quickly into a given problem. Also its a long time since I used a RAMdisk. All of these things are time consuming and the simpler it is for us, the quicker we can respond.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

23 Dec 2016 6:08 #18612

Paul,

The small codes you ask are all posted above (for binary output) and are the same and super simple if you already looked at what mecej4 asked for text output. You just have to find why readf@ having limit 4 GB/s and if it is possible to get 10-12GB/s. If speed is defined only by Microsoft I/O DLLs like Mecej4 claims then you simply must get those 10-12 GB/s with some code optimization because these top speeds should be compiler-independent.

Without RAMdisk (or they are also called RAMdrives ) all this will not work because the latency and bandwidth of even fastest SSD is not enough to squeeze the crazy numbers like 12GB/s. So you will need to install some free RAMdrive.

And need install CrystalDiskMark or any other popular disk drive speed testers to see if your RAMdrive is fast enough. The RAMdrives are just generally very good, they are good for fast compilation and fast loading of large files

Benefit for all that if you reach 10GB/s speeds is that you will claim that the new 64bit compiler is 'Big Data ready' with its tremendous unseen before I/O speed numbers. Mecej4 and John will make comparison with Intel, Lahey, gFortran and all people will just drop the jaws under the tables seeing the read and write bars going to sky versus say gFortran - two-three orders of magnitude higher !

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

23 Dec 2016 6:32 #18613

Dan

I can only repeat, if you will post the code, the data and details of the command line arguments then I am happy to log it as worthy of attention. It is a lot easier for you to post again than for me to work out which code etc that you are referring to. I have not read this thread in detail and it would take a significant amount of time to read and understand it.