|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
DanRRight
Joined: 10 Mar 2008 Posts: 2867 Location: South Pole, Antarctica
|
Posted: Wed Jul 26, 2023 9:15 pm Post subject: Re: |
|
|
JohnCampbell wrote: |
Write OK. Speed of write Method 1 = 0.360 0.359 GB/sec
READ OK. Speed of read Method 1 = 0.462 0.455 GB/sec
|
Huge improvement on your new method vs old method, 0.360 vs 0.359, right ?
What numbers i was interested to see asking Paul to repeat the test? Method1 because they show 2x improvement versus mine
Why i asked this? Because there are couple major differences between his PC and mine PC: processor type and DDR4 vs DDR5 type of memory. I never seen 2x differences with the same generation processors and no matter how fast memory i used i never got more than 10-20% differences before. If some specific processor/memory type combination plays such huge difference that could potentially improve my numerical simulations speed 2x if i switch. So with Method 1 I was actually not interested with the speed of harddrive (i will use for that Method2) but the speed of some other computer sybsystem this test abruptly revealed
Why i was not interested with your "improvements"? Because my Method 1 runs for a long time and there will be no differences between any timers. Method 2 is much faster but it will be stopped by the speed of PCIe/drive speed itself anyway and i also need it to read/write only huge files where test runs for a long time.
If you are interested with general timer improvement try to convince Silverfrost to create new timer. Though FTN95 needs other much more important things to improve to be a compiler of 21th century. But when something needs the improvements here you quietly vote with your feet for other compilers where all that things are already realized |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2587 Location: Sydney
|
Posted: Thu Jul 27, 2023 4:19 am Post subject: |
|
|
Dan,
I don't think it is too difficult to understand the difference between elapsed time and CPU allocated time.
Clearly the difference is more significant with Method 2, where there are more disk wait delays.
In my previous results I linked in Posted: Fri Jul 21, 2023 8:42 pm, I tried to in better identify this problem, by also reporting % CPU usage as a proportion of elapsed time.
Code: |
============ Trying to save the data Method 2 ============
Method 2 write 6.7683 el 2.9687 cp 43.86 %
Write OK. Speed of write Method 2= 2.964 1.300 GB/sec
============ Now read Method 2 ============
Method 2 read 2.0316 el 1.6719 cp 82.29 %
READ OK. Speed of read Method 2 = 5.264 4.332 GB/sec
|
In my most recent post I tried to explain the many reasons why there are problems with testing IO performance and also explain how the use of CPU_TIME significantly overestimated the Method 2 performance.
Because there is a lot more CPU usage for Method 1, the alternative rate estimates are not so inconsistent, which I expected you would have understood.
The only improvement for FTN95 would be to provide an integer*8 function rdtsc_ticks () ( and integer*8 function rdtsc_tick_rate () ), although this is not required for these tests. This should be provided by all compilers ! |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2587 Location: Sydney
|
Posted: Sat Feb 10, 2024 9:29 am Post subject: |
|
|
I have tidied up a program that uses stream I/O to replicate FTN95 unformatted sequential access files for larger records.
This approach takes the 1-byte (records up to 240 bytes long) else -1+4 bytes approach of FTN95 and extends to -1 -1 +8 bytes header/footer for records larger than 2^31 -9 bytes.
This test creates a test file of 12 GBytes. ( max record size about 3 gbytes )
I ran it on a 500 GByte NVME SSD drive, so it may be slower on a slower drive.
Hopefully it demonstrates the 1-4-8 record header approach that could be a possible extension for FTN95
Alternatively, this example might be adapted to the Gfortran / Ifort approach of 2^31-9 long sub-records.
Basically Stream I/O is a very flexible platform for creating file record structures.
I thought this example may be of interest to those wanting to manage larger file records.
Let me know if you like or don't like this approach ?
https://www.dropbox.com/scl/fi/n3d61rw15occbvs9vl1ik/read_large_record_v3.f90?rlkey=5h9d5kip1oaj6wzt7364xhg5w&dl=0
https://www.dropbox.com/scl/fi/r25m5a99k90950847rrxe/read_large.log?rlkey=l5kuymezc1lfbezwsln9zl0go&dl=0 |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2587 Location: Sydney
|
Posted: Sat Mar 09, 2024 5:57 am Post subject: |
|
|
I have made progress on testing write / read of large records, now up to 25 gBytes. These now work for Unformatted sequential read write and also for stream read/write.
They should work on the next release of FTN95 Ver 9.0?
For records larger than 2 GTBytes, the unformatted sequential header/trailer is now 9 bytes, where byte 1 = -2 for an 8-byte size value.
I have also been looking at the I/O speeds for PCIe SSDs ( I think mine is a Ver 3 ) The rates they quote are a bit misleading.
On my PC which has 64 GBytes of physical memory, If I do a write then read test for a 32 GByte file, I can get write speeds up to 2.8 GBytes/sec, unformatted read up to 4.0 GBytes/sec and stream read over 7.5 GBytes/sec.
These high read speeds are basically because the file is stored in the memory disk buffers.
If I split this test program into 3 seperate programs or increase the file size to more than 64 GBytes, the performance declines:
For write speed reduces as the file size increases, from 3.0 GBy/sec for the first record, to 0.8 GBy/sec for most others. This is due to overflowing the available memory buffers and also the SSD internal buffers.
For sequential read, speed starts at 0.28 GBy/sec for the first record, then to 0.9 GBy/sec. This is due to limited available memory buffers.
For stream read, speed starts at 2.15 GBy/sec for the first record, then declines gradually to 1.5 GBy/sec.
These read speeds are much less than the case where reads followed writes and file size was much less than physical memory.
this is due to limited available memory buffers.
On the plus side: if the file is buffered in memory by the OS, stream re-read can achieve great rates in moving the data from OS memory to program memory ( over 8 GBy/sec in some cases )
On ite -ve side: if the SSD buffers are full or OS memory buffers are not pre-loaded, transfer rates are far lower than the disk technology quote.
Buffering is great when it works!
In these tests I tested records from 1 GByte to 25 GBytes, so large records do now work.
However, in these tests the I/O lists I am using are very basic:
"write (lu, iostat=iostat ) vector" or "read (lu,iostat=iostat) vector"
so more complex I/O lists may be different.
Even "read (lu,iostat=iostat) vector(1:nn)" crashes with a Vstack error, while "read (lu,iostat=iostat) (vector(k),k=1,nn)" is much slower for a well buffered case.
We can now save arrays much larger than 4 GBytes.
I will post the program that sucessfully tests up to 25 GByte records and 66 GByte files.
If using stream read, it is also relatively easy to read FTN95, Gfortran or Ifort unformatted sequential records, providing the I/O list is not too complex. Files created with Stream I/O should be much more portable.
The Vstack error at 8 GBytes was a surprise so there could be more surprises at greater sizes ? |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|