forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Bug in SCC 3.88
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Fri Nov 25, 2016 2:47 am    Post subject: Bug in SCC 3.88 Reply with quote

For the following program, SCC /64 generates two false warnings.
Code:
#include <stdio.h>
#include <stdlib.h>
#define MMASK 0x7FFFFF
#define SMASK 0x0800000
#define OMASK 0x7000000

int main(){
int ival; unsigned mant;
int expo2,expo8,nshft;
int n=3;

ival=0x38C8EB83;
mant= (ival & MMASK) | SMASK;
expo2=((ival >> 23) & 0x0FF) - 0x07F - 2;
switch(expo2%3){
   case -1 : mant <<= 2; expo2-=2; break;
   case -2 : mant <<= 1; expo2--; break;
   case 1 : mant <<=1; expo2--; break;
   case 2 : mant <<=2; expo2-=2; break;
   }
expo8=expo2/3;
if(mant & OMASK){
   nshft=n-8; expo8++;
   }
else nshft=n-7;
printf("nshft = %d\n",nshft);
return 0;
}

The messages:
Code:
   0021   expo8=expo2/3;
WARNING - This statement will never be executed
   0025   else nshft=n-7;
WARNING - This statement will never be executed
    NO ERRORS, 2 WARNINGS  [<BUG> SCC/WIN32 Ver 3.88]


P.S. Sorry, I should have posted this in the Support section.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1544
Location: South Pole, Antarctica

PostPosted: Tue Dec 13, 2016 12:27 am    Post subject: Reply with quote

Mecej4, Since you are familiar with SCC I have the following suggestion/request if you have some free time. Can the CrystalDiskMark be compiled successfully with C and show how it works? How it checks I/O speed ? By this we will know how read/write speed test works, where are bottlenecks and is there potential for improvement.

Question #1 is: test shows the read or write speeds 10 GB per second on RAMDrives. That means that the read/write itself (as overhead) must go with even faster speeds! Is this true with C ?

http://crystalmark.info/software/CrystalDiskMark/index-e.html
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Thu Dec 15, 2016 1:49 pm    Post subject: Reply with quote

That is a full-fledged Windows GUI program, and I do not think that SCC can compile the thing from source code without a lot of pampering. Besides, why on earth do you want to compile it from source?

Frankly, I do not understand your fixation on I/O benchmarks when there are so many other aspects of your programs that are more worthy of your attention.

A disk I/O benchmark program is justified in shoving random data to and fro and timing the movement. You cannot do the same, however, in any real application that does something useful. Real programs tend to consume and/or produce buckets of data. If you want to assess how fast your application would perform with almost infinite I/O speed, simply set the output file to NUL: (on Windows; /dev/null on Unix/Linux) and time a simulation run of your program. If the time of the run is not drastically less than it was with output to a real file, you will have proved that you are barking up the wrong tree.

You can also try this MS Technet command-line utility to time I/O to a specific file of your choice:

https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1544
Location: South Pole, Antarctica

PostPosted: Fri Dec 16, 2016 8:16 am    Post subject: Reply with quote

I more or less know how my app will behave with infinite I/O speed. It will go at least ~2-3x faster. I can stretch additional factor of 2 probably by switching off some extra not always needed calculations during the load. And that's the reason of my interest. My Fortran loading speed with even *unformatted* load is hell annoying because it's slow, around 300KB/s.

This is the program which visualizes existing data, and there is many TBs of data. When you try to find something in this forest, ideally visualization must go with instant speed because a lot of data you click on is just not what you need to find. As a result you even do not want to touch the data so sickening boring the loading process is. As soon as data is loaded the OpenGL visualization is almost instant thanks to its very good OpenGL implementation and fast hardware (thanks to realistic 3D games).

With few Fortran compilers we do not see speeds faster than mentioned above with any settings. C code like with this benchmark though somehow shows speeds 30x faster. Question remains - how C reach that speeds and why Fortran can't ?
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Fri Dec 16, 2016 4:22 pm    Post subject: Reply with quote

That note clears up some questions. It also clarifies that by using I/O devices and software that are "30X faster", your effective overall speed gain may be about 2X. And, because the I/O is mostly input of massive amounts of data, you cannot use the NUL device to test the best achievable speed.

Other speed-ups such as those coming from avoiding or delaying calculations are not relevant at this point of the discussion. You can implement them or not, independently of the I/O problem and solution.

This kind of situation is standard in searching in a database. The usual solution is to compile an index to the data tables. These indices are much smaller than the main tables, and so one can search fast in the index and, when an exact or partial match is found, the corresponding portion of the main table is read into memory and processed further.

The indices do take time to build, but they need to be rebuilt/refreshed only when the new data is loaded or old data is deleted. Therefore, for the "create once, use many times" scenario, they are definitely worthwhile.

To define and create an effective index, you have to know your data intimately, and you must have a good idea of the access patterns of your users (including yourself). You have probably used an old dictionary that had thumb indices cut into the edge of the pages. So, if you want to look up 'Dan', you put your thumb on the 'D' notch and open the book. The same idea should be tried on your data.

Your reaction?
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1544
Location: South Pole, Antarctica

PostPosted: Sat Dec 17, 2016 4:34 am    Post subject: Reply with quote

I still hope for getting 5x from just the software I/O speed bump alone. Because if C really can read GBs per second Fortran literally MUST do that even faster, this is what users expect from Fortran - to beat all others in speed in science and engineering area.

If this will fail - the only other way to me to speedup the navigation would be to make small thumbnail images of all parameters same like with the photography. I can not imagine how it is possible to make indexing for fast search other way.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Sat Dec 17, 2016 2:19 pm    Post subject: Re: Reply with quote

DanRRight wrote:
...if C really can read GBs per second Fortran literally MUST do that even faster, this is what users expect from Fortran " .


That is a wish stated in the form of an assertion that happens not to be true.

Fortran code can be marginally faster than C for some type of work (numerical calculations, for example) and can be substantially slower than C for other types of work (character processing, for example). These days, Fortan and C compilers on microprocessor systems almost always use the same back-end for code generation and optimization, and most compiler systems use substantially the same RTL (Microsoft DLLs).

In general, in my experience, the speed of compiled Fortran code is the same as that of compiled C code.

Starting out with expectations of the improbable or, worse, the impossible, is not a recipe for success.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1772
Location: Sydney

PostPosted: Sun Dec 18, 2016 2:14 am    Post subject: Reply with quote

Dan,

I would like to agree with mecej4.

In the benchmarking I did for you, I showed that even basic numerical conversion of text, with no file I/O processes about 100 million bytes per second. ( some gFortran are very poor and convert F and ES at 4 MB/sec; while FTN95 /64 and FTN95 "/32" do much better)
With a processor clock rate of 3 giga hertz, I don't see how you could achieve multiple giga bytes per second. ( The C code rate claims don't look realistic or if are real they can't be utilised by even basic processing of the info. Your quoting multiple GB is not feasible, as you can not process them at that speed.)
When quoting transmission rates, there is always the difference between MB (mega bytes) and Mb (mega bits) or Gb (giga bits), so there is always the uncertainty of what speed is really being quoted.

My impression was that you were struggling with 1 MB (megabyte per second) read and processing, which could be increased to 50 to 100 with stream I/O on HDD or 200-500 MB with SSD.
BUT, as you can only process the characters at about 100 MB, does it matter ?

Also, you have not identified the source of this data, How do you get it ?
If it is via the internet, the transmission rate for receiving the files is much slower than you can read them from disk.

In summary, you need to identify where the bottleneck is, and I doubt if it is with SSD or HDD transmission rates. It will probably be with processing or receiving the files.

It sounds to me that you need to have multiple PC's to process all the different files into summary or indexed forms.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1544
Location: South Pole, Antarctica

PostPosted: Sun Dec 18, 2016 3:23 am    Post subject: Reply with quote

Ok, mecej4 and agreeing with you John,

Please show me read and write speeds at least a half what CrystalDiskMark measures, or 5-6 GBytes per second in my case on ramdrives (yes bytes not bits per second like all C tests show) with any your methods using Fortran and then we will continue conversation about Fortran delivering almost the same speeds as C.

I even don't read and process characters, John, I use unformatted read. You are welcome to use it too for your tests to make your life easier. Processing speed after data was loaded should be absolutely different topic and is not discussed here.
PM me your address and I will send you 12 beers for the effort. Smile
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Sun Dec 18, 2016 1:49 pm    Post subject: Reply with quote

Dan, I think that you are still tilting at windmills, as you can see with these tiny example programs. Both write a 64 MByte "binary" file. I ran the programs on a laptop with an i5-4200U CPU and a 128 MB ramdisk.

The Fortran code:
Code:
program writebinbuf
integer, parameter :: I2 = selected_int_kind(4), I4 = selected_int_kind(9), &
                      I8 = selected_int_kind(18)
integer, parameter :: BSIZ = Z'4000000'   ! 64 megabytes
character (Len=1) :: buf(BSIZ)
integer (I2) :: hndl, ecode
integer (I8) :: nbytes = BSIZ
real :: t1,t2
!
call openw@('big.bin',hndl,ecode)
if(ecode /= 0)stop 'Error opening file BIG.BIN for writing'
call cpu_time(t1)
call writef@(buf,hndl,nbytes,ecode)
call cpu_time(t2)
if(ecode /= 0)stop 'Error writing file BIG.BIN'
call closef@(hndl,ecode)
if(ecode /= 0)stop 'Error closing file'
write(*,'(A,2x,F7.3,A)')'Time for writing 64 MB file: ',t2-t1,' s'
write(*,'(A,6x,F6.0,A)')'Estimated throughput = ',64.0/(t2-t1),' MB/s'
end program

The equivalent C program:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <io.h>
#include <fcntl.h>
#include <time.h>
#include <sys/stat.h>

#define BSIZ 0x4000000

int main(){
char *buf; int fid,bsiz=BSIZ; clock_t t1,t2;
float te;

buf=(char *)malloc(bsiz);
fid=open("BIG.BIN", O_CREAT | O_WRONLY | O_BINARY); //, S_IWRITE | S_IREAD);
t1=clock();
write(fid,buf,bsiz);
t2=clock(); te=(t2-t1)/(float)CLOCKS_PER_SEC;
printf("Time for writing 64 MB to file: %6.3f s\nEstimated throughput = %.1f MB/s\n",
   te,64.0/te);
close(fid);
}

We run the first with FTN95:
Code:
s:\FTN95>ftn95 /no_banner fwrfil.f90 & slink fwrfil.obj & fwrfil
Creating executable: s:\FTN95\fwrfil.exe
Time for writing 64 MB file:     0.047 s
Estimated throughput =        1365. MB/s

We run the second with SCC:
Code:
s:\FTN95>scc /no_banner cwrfil.c & slink cwrfil.obj & cwrfil
Creating executable: s:\FTN95\cwrfil.exe
Time for writing 64 MB to file:  0.047 s
Estimated throughput = 1361.7 MB/s

Vive la non-différence! And, please drink those 12 beers on my behalf.

It would be interesting to see what numbers you get on your terabyte cruncher of a machine with these small test programs.

Once you run one of these two programs you will have a 64 MB file that you can use with similar read tests. Change writef@ to readf@, and so on. I see more or less the same speeds for reads as I did for writes.

Having done that, compare the read throughput (1.36 GB/s on my laptop) with the value that you gave in #3 (counting from 0 for the initial post), 0.0003 GB/s. The difference must be investigated, and it will be found to be explained by the throughput of your I/O devices and by the rate and complexity of processing the data in your application, and not at all by differences between C and Fortran, because the grunt work of the I/O is done in the MS system DLLs.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Mon Dec 19, 2016 1:17 am    Post subject: Reply with quote

Dan, there is something else that I don't understand about your "problem statement". You said in #3 that your reading speed with unformatted Fortran files was 300 KB/s, and that you needed to process "terabytes of data". If so, you will have to run your computer nonstop 24/7 for over five weeks for one run. Really? Are you doing this now?
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1544
Location: South Pole, Antarctica

PostPosted: Mon Dec 19, 2016 4:51 am    Post subject: Reply with quote

LOLOLOL Now i know why you did not want my present. Clearly like me on my North Pole you have no lack of booze in your place Smile. Well, a holiday season, anyway.

This KB/s was of course a typo. I mentioned MB/s before many times but not this one.

Please keep going. Still 1.8GB/s on my PC, not 5-6 let alone 10-12 but the steps are encouraging.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 732

PostPosted: Mon Dec 19, 2016 1:22 pm    Post subject: Re: Reply with quote

DanRRight wrote:
Please keep going. Still 1.8GB/s on my PC, not 5-6 let alone 10-12 but the steps are encouraging.

Was that run with the current directory on a RAMdisk? If so, you can only hope to get throughput that is less than that, unless you can introduce parallelism into your application.

As we found in our earlier thread where we compared formatted internal reads and direct conversion of input strings to numbers, the best that we could do, without any error checking of the input, i.e, with zero disk latency, was about 300 MB/s. In this test, we have found that raw file I/O, i.e., with zero decoding latency, can be done at about 1 - 2 GB/s. If you combine these latency estimates (similarly to two resistors in series), you can estimate an effective speed of less than 230 MB/s for formatted reads from disk (less because a 4 byte real stored as a decimal number on disk takes about 12 bytes). If that is not good enough, I don't see how you can overcome these latencies without resorting to parallel processing.

Please go back and look at some of John Campbell's comments about what realistic I/O speeds you can aim for.


Last edited by mecej4 on Mon Dec 19, 2016 2:33 pm; edited 1 time in total
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1544
Location: South Pole, Antarctica

PostPosted: Mon Dec 19, 2016 1:53 pm    Post subject: Reply with quote

Something deeply wrong is here

1) Does the CrystalDiskMark test use parallelism to leave all the test results here in shameful misery?

2) How in the world it is possible to write 12GB of data to disk drive in one single second while it is not possible to load same 12 GB in second into RAM of the computer? Fun also is that RAMdrive is made out the same RAM and generally in practice I/O was never faster then RAM bandwidth.

Also as a note, ReadF@ and ReadFA@ may be fast to read big chunk or data but they are still very slow in reading line by line (10 numbers or ~160 characters per line)
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 4915
Location: Salford, UK

PostPosted: Mon Dec 19, 2016 3:26 pm    Post subject: Reply with quote

mecej4

Thanks for the feedback. I have made a note of your original post.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group