|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
JohnCampbell
Joined: 16 Feb 2006 Posts: 2555 Location: Sydney
|
Posted: Sun May 16, 2021 6:20 am Post subject: |
|
|
Dan,
I know it is a long time since you asked
"the reading speed drops twice from 4 GB/s to 2GB/s. Any ideas why?"
I think it is because you are using CPU time, rather than elapsed time.
Anyway, I modified your code to :
also report elapsed time ,
use ALLOCATE to remove stack problem, and
report times for initialising and testing the arrays.
I would be interested if you still get similar performance with the new NVMe PCIe 4 "haddrives"
Code: | ! compilation for 32bit case :
!
! Revisions:
! use ALLOCATE arrays to use HEAP instead of STACK
! include Elapse_Time > SYSTEM_CLOCK to report actual run time
! scan C == A to show a minimal processing time
!
! ftn95 unfRead.f95 /opt /set_error_level error 298 /no_truncate /nocheck /silent >equiv_FTN95
! slink unfRead.obj /stack:1200000000
!
integer, parameter :: ix = 18, iy=800000, iz=20
!! integer, parameter :: BSIZ = 4 * ix * iy * iz
!zz real*4 A(ix,iy,iz), C(ix,iy,iz)
real*4, allocatable, dimension(:,:,:) :: A, C
! integer*2 :: hndl, ecode
! integer*8 :: nbytes = BSIZ, nread
integer :: iostat, iix,iiy,iiz, ne
real*4 :: t1,t2, e1,e2, Mb
! adding or removing this line changes nothing
! equivalence (A,Buf) ! converts initial real*4 data to character*1
! **** Removing this useless line decreases reading speed 2x ****
!z equivalence (C,bufRead) ! converts loaded character*1 data real*4
Mb = 4.*real(ix)*real(iy)*real(iz) / (1024.**2) ! MBytes
call elapse_time (e1)
allocate ( A(ix,iy,iz), stat=iostat )
write (*,10) 'A',iostat,size(A), Mb
allocate ( C(ix,iy,iz), stat=iostat )
write (*,10) 'C',iostat,size(C), Mb
10 format (a,' allocated : STAT= ',i5,' Size =',i10,' bytes', f8.2,' MBytes')
!............. setting up data ...............
!$OMP PARALLEL DO PRIVATE (iix,iiy,iiz) SHARED (A)
do iiz=1,iz
do iiy=1,iy
do iix=1,ix
A(iix,iiy,iiz) = iix
enddo
enddo
enddo
!$OMP END PARALLEL DO
call elapse_time (e2)
Write (*,11) 'A initialised : A=',A(1,1,1), A(2,1,1), (e2-e1)
write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
11 format (/a,2f6.0, f8.3,' sec')
!............writing to disk..................
OPEN (UNIT=275,FILE='Y.bin',STATUS='unknown',FORM='UNFORMATTED',err=990)
call cpu_time(t1)
call elapse_time (e1)
write (275) A ! WRITE A to file
call cpu_time(t2)
call elapse_time (e2)
close (275)
write (*,12) 'Time for writing file of size MB: ',Mb, t2-t1, e2-e1,' sec'
write (*,13) Mb/max(1.e-6,(t2-t1)),' MB/sec'
write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
12 format (/A,2x,f7.0, 2F7.3,A)
13 format ('Estimated throughput = ',6x,F6.0,A)
! ...........reading from disk................
OPEN (UNIT=257,FILE='Y.bin',STATUS='old',FORM='UNFORMATTED',err=995)
call cpu_time(t1)
call elapse_time (e1)
read (257) C ! READ C from file
call cpu_time(t2)
call elapse_time (e2)
close (257)
write (*,12) 'Time for reading file of size MB: ',Mb, t2-t1, e2-e1,' sec'
write (*,13) Mb/max(1.e-6,(t2-t1)),' MB/sec'
write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
! ...........testing for errors................
|
Last edited by JohnCampbell on Sun May 16, 2021 6:42 am; edited 1 time in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2555 Location: Sydney
|
Posted: Sun May 16, 2021 6:21 am Post subject: |
|
|
Code: | ! ...........testing for errors................
call elapse_time (e1)
ne = 0
!$OMP PARALLEL DO PRIVATE (iix,iiy,iiz) SHARED (A,C) REDUCTION(+ : ne)
do iiz=1,iz
do iiy=1,iy
do iix=1,ix
if ( A(iix,iiy,iiz) /= C(iix,iiy,iiz) ) ne = ne+1
enddo
enddo
enddo
!$OMP END PARALLEL DO
call elapse_time (e2)
write (*,11) 'Checking if C == A', C(1,1,1), C(2,1,1), e2-e1
write (*,*) ne,' errors'
write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
goto 10000
!................. errors ......................
990 Print*, 'Error opening file Y.BIN for writing'
goto 10000
995 Print*, 'Error opening file Y.BIN for read'
goto 10000
10000 continue
end
subroutine elapse_time (sec)
real*4 :: sec
integer*8 :: clock, rate
call system_clock (clock, rate)
sec = dble(clock) / dble(rate)
end subroutine elapse_time
|
The option is identified to use mult-threading to improve processing performance outside the I/O, but there are still limitations on what can be gained from the high rates of SSD. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2824 Location: South Pole, Antarctica
|
Posted: Sun May 16, 2021 8:28 am Post subject: |
|
|
1) Your program
RAMDRIVE
------------
A allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes
C allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes
A initialised : A= 1. 2. 0.797 sec
Estimated throughput = 1379. MB/sec
Time for writing file of size MB: 1099. 0.375 0.391 sec
Estimated throughput = 2930. MB/sec
Estimated throughput = 2813. MB/sec
Time for reading file of size MB: 1099. 0.328 0.344 sec
Estimated throughput = 3348. MB/sec
Estimated throughput = 3196. MB/sec
Checking if C == A 1. 2. 0.672 sec
0 errors
Estimated throughput = 1635. MB/sec
NVMe
-----------
A allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes
C allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes
A initialised : A= 1. 2. 0.813 sec
Estimated throughput = 1352. MB/sec
Time for writing file of size MB: 1099. 0.609 0.609 sec
Estimated throughput = 1803. MB/sec
Estimated throughput = 1803. MB/sec
Time for reading file of size MB: 1099. 0.344 0.344 sec
Estimated throughput = 3196. MB/sec
Estimated throughput = 3196. MB/sec
Checking if C == A 1. 2. 0.688 sec
0 errors
Estimated throughput = 1598. MB/sec
2) The previous program (with FTN95-specific READF@/WRITEF@ instead if standard READ/WRITE) shows 2x larger read speeds. Both timers, mine or yours show the same time for RAMDRIVE.
A= 1.00000 2.00000
Time for writing file of size MB : 320 0.109 s
Estimated throughput = 2926. MB/s
Time for writing file of size MB2: 320 0.109 s
Estimated throughput = 2926. MB/s
Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Checking if C=A 1.00000 2.00000
but very different write speed for NVMe
A= 1.00000 2.00000
Time for writing file of size MB : 320 0.078 s
Estimated throughput = 4096. MB/s
Time for writing file of size MB2: 320 0.188 s
Estimated throughput = 1707. MB/s
Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Checking if C=A 1.00000 2.00000
RAMdrive speed is possible to overclock 1.5 times though
Last edited by DanRRight on Sun May 16, 2021 6:08 pm; edited 1 time in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2555 Location: Sydney
|
Posted: Sun May 16, 2021 1:04 pm Post subject: |
|
|
Dan,
You are achieving much higher throughput values than I am getting, even for SSD drives. The O/S file buffers and SSD buffers may be helping (which is good but confusing)
Did you change the other tests to elapse_second ?
I think the performance rate for checking is important, as that is a data handling rate for the most trivial test.
Most large files I read are text files where the data is comma or space seperated survey data. The first processing is a data validation phase, where I check if the data value is outside an acceptable range; to be omitted from the data set if invalid. (64-bit certainly helps with being able to store all the data in memory for validation and then later use.)
I actually go through a process of "multiple data assessments", as I better understand what is rejected data vs what is significant outliers and refine the way I adjust then use the data.
If your main use is with HDF5 files, which I guess is a structured binary file, then you are dealing with much larger data sets, which is a problem I have not addressed. It certainly appears to be that your NVMe drive is good for your projects.
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2824 Location: South Pole, Antarctica
|
Posted: Sun May 16, 2021 9:19 pm Post subject: |
|
|
Yes, i added second timer like you have done. Both mostly show the same except one case (see 1707 number?)
I can not find anyone who succeeded to read HDF5 files in Fortran. There was one who is now retired. There exist HDF5 libraries for couple other Fortran compilers. I'd be also happy if someone with any other compilers for Windows (better with Fortran for faster writing) created small tool which will read HDF5 file in its language and transform it to usual binary file. HDF group made some handy tool HDFdump which is doing that but it is doing that with 1/3000 speed because is written in C++. Our people made such tool using Python which shows same superslow speeds. I am screwed up so much that in some cases just the read goes the entire day. If FTN95 could read HDF5 files this would take 60 seconds
I contacted HDF Groups and suggested them to use Fortran for faster write in this transformation tool or better to make the HDF5 library for FTN95 specifically so the tool would be not needed but they are clueless and useless in these matters
Good about HDF5 is that any output from Fortran or any other language could be read by any other language. Plus of course the read/write speed is crazy fast, specifically write, probably in 20-30 GB/s range, like if you are writing from RAM into the same RAM directly instead of any harddrives, RAMdrives or NVMes |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2555 Location: Sydney
|
Posted: Mon May 17, 2021 2:30 am Post subject: Re: |
|
|
DanRRight wrote: | I am screwed up so much that in some cases just the read goes the entire day. If FTN95 could read HDF5 files this would take 60 seconds |
Dan, you are known for exageration !!
How big is the dataset that you are reading? Does it span multiple many terrabyte drives ? Please answer !
Perhaps you need a server hardware with more tuned disk I/O ports. PCIe 4.0 x16 : a Gen 4 expansion card or slot with a 16-lane configuration plus 64 or 128 GBytes of memory may do better, but this is hardware tuning. (what hardware/motherboard supports multiple PCIe 4.0 drives? My latest PC SSD is only PCIe 3.0)
My file I/O now takes place in seconds or rarely minutes so I am not familiar with your problem.
Most (all?) Fortran compilers rely on a C?? interface so I don't think Fortran or C would be much different.
Using Fortran stream/transparent I/O would appear to be a likely requirement for HDF5. (The software buffers would be tuned to the hardware technology and motherboard memory installed)
If you are seriously taking a long time to read the "dataset" then the likely cause would be significant random I/O. The most likely improvement would be drives with larger memory buffers or more memory for file buffers for Windows O/S. You could reconstruct the file indexes in memory, but that is significant change to the software. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2824 Location: South Pole, Antarctica
|
Posted: Mon May 17, 2021 3:35 am Post subject: |
|
|
Exaggeration?
10-20 TB output in single run sometimes going for a week on supercomputer. I reduce it decreasing to 0.5-1 TB
Speed of data extraction from HDF5 into binary is around 10-20MB/sec (into ASCII around 3MB/s) . Speed of reading of extracted data is 10x higher but not yet as high as NVMe because some minor calculations go at the same time not often needed
Do the math. Up to a day for extraction and reading. You go to sleep and morning it may finish or not
If i would read HDF5 directly without extraction i'd read it with ~7GB/s speed.
1-2 minutes |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2555 Location: Sydney
|
Posted: Tue May 18, 2021 11:26 am Post subject: |
|
|
Dan,
You are quoting a significant range of performance.
For 20 TB of information, if we use a NVMe drive and achieve 3 GB/sec, that equates to 1.85 hours to read the info.
However, if you are only getting 20 MB/sec with HDF5 that would take 11.6 days to read.
What does this HDF5 offer, apart from a long lunch ?
I would expect that a Stream/Transparent writing from the super-computer would be a likely data source, then reading on NVMe would be much simpler.
You would need some error recovery and possibly multiple files, but my very limited experience of large datasets has been a very simple data structure.
Multiple files could offer multi process reduction for simplified mult-thread emulation.
If "data extraction from HDF5 into binary is around 10-20MB/sec", HDF5 does not look like a good option !!
( Is this due to the HDF5 data management overhead ? )
"is doing that with 1/3000 speed because is written in C++" does not look like a valid reason, as C++ and Fortran should perform similarly.
The closest experience I must have had to HDF5 must be KDF9 Algol in 1973, which was the slowest computer I ever used. At least that is what I remember?
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2824 Location: South Pole, Antarctica
|
Posted: Tue May 18, 2021 2:27 pm Post subject: Re: |
|
|
JohnCampbell wrote: |
If "data extraction from HDF5 into binary is around 10-20MB/sec", HDF5 does not look like a good option !!
( Is this due to the HDF5 data management overhead ? )
"is doing that with 1/3000 speed because is written in C++" does not look like a valid reason, as C++ and Fortran should perform similarly |
This is some additional tool supplied with the HDF5 which has slow speed and is written in C++, as i was told by customer service rep. I am forced to use it to transform HDF5 files into binary because i can not read them directly - we have no FTN95 compatible library. Some features from Fortran Standard 2008 need to be there. May be they are already implemented in FTN95 but this needs time to investigate |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|