forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Adding /check causes EXE to run 800 times slower
Goto page Previous  1, 2
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 933

PostPosted: Mon Apr 02, 2018 1:24 am    Post subject: Reply with quote

Thanks; please provide the link.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Mon Apr 02, 2018 3:57 pm    Post subject: Reply with quote

This has been fixed (after a fashion) for the next release of clearwin64.dll (not the one released today via this forum).

The failure was in __pcheck which was doing far too much validating.

__pcheck checks that ALLOCATEd store is not accessed after it has been DEALLOCATEd.

In the longer term we will need a better strategy for this particular check.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 933

PostPosted: Mon Apr 02, 2018 4:48 pm    Post subject: Reply with quote

Thanks for the information, Paul.

In the short term, with 8.10 and earlier, would it hurt to put /inhibit_check 4 in FTN95.cfg, if I am sure that my codes have no variables that have the pointer attribute?

Many of the allocatable variables in the test program are local subprogram variables, so these should get automatically deallocated at subprogram exit. For those variables, the checking (for incorrect use after they are DEALLOCATEd) should be stopped at subprogram exit, because those variables do not exist thereafter.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Mon Apr 02, 2018 9:19 pm    Post subject: Reply with quote

Just to be clear, it's not just the POINTER attribute but also the ALLOCATABLE attribute that is involved - so anything that is ALLOCATEd.

If /config does the trick then all well and good.

We may be close to a decent fix which will be in clearwin64.dll and can be release vey soon. If you are not up to date with v8.30 then using the latest DLLs may work - it would just be a case of testing after taking a careful backup of your existing DLLs.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1968
Location: Sydney

PostPosted: Tue Apr 03, 2018 10:57 am    Post subject: Reply with quote

The following link is the test directory that I have been using to test the program.

https://www.dropbox.com/s/m504de68txw8br0/swmo_test.zip?dl=0

There are a "few" tests in the directory, but the latest test to look at is:

swmol3.f90 is my latest modified version, which includes calls to my spot_timer_64.f90 code which uses rdtsc_val@ to monitor ticks of the code.
swmol3_orig.f90 is te original code
cmp_changes.tce lists the differences between the two.

I did 3 other changes to the code:
changed KABS array as allocatable. It is a 224 Mb local array
changed fort.2 to fort.12 to remove write (*,*) problem
changed "real (kind=wp), dimension (nsiz+1, nndim), save :: tab" to be in COMMON

For testing I have used /timing and also included my routine spot_time, which allows timing of selected code inside a routine. (change the code for calls)

The most glaring slow down is at lines 2601:2605
call spot_time (24, '<conlor> set f 24')
do i = 1, 3*nbb
a1nbb(i) = zero
end do
call spot_time (43, '<conlor> zero f 24')
I have calculated that each iteration of the do loop can take up to 150,000 processor cycles !!!

I hope this helps

The latest .bat file to do the test is "build_64.bat"

John
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 933

PostPosted: Tue Apr 03, 2018 1:10 pm    Post subject: Re: Reply with quote

JohnCampbell wrote:
...The most glaring slow down is at lines 2601:2605
call spot_time (24, '<conlor> set f 24')
do i = 1, 3*nbb
a1nbb(i) = zero
end do
call spot_time (43, '<conlor> zero f 24')
I have calculated that each iteration of the do loop can take up to 150,000 processor cycles !!!


Thanks a lot for your investigative work!

Perhaps I should look into why this simple loop (whose purpose is to zero out an array) can take up so much time. Perhaps, /check causes the allocation status to be checked during every iteration of the DO loop.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1968
Location: Sydney

PostPosted: Wed Apr 04, 2018 7:27 am    Post subject: Reply with quote

/timing is a very effective FTN95 option for detecting slow performance, as it identifies the routines that have long run times and that call other routines. It requires no change to the code; only to the compile options.
My approach is to first use /timing on everything, then identify utility routines that may be called millions++ of times, these can be moved to another file that is not compiled with /timing. The first use involves no change to the code.

The use of spot_time (which is based on the RDTSC timer) is also useful for timing parts of the routine. It requires only a set of suitable calls to be placed in the code. It reports both seconds and processor cycles, which can be a useful measure to compare to expected operations counts.

This was my first use of /64 /check /timing, so was good to see this combination has been ported to 64-bit.

(Not sure why the 10 second calibration is still required, as you can use the program run time for that. I also posted test_timer_64.f90 which demonstrates reasonable calibration can be achieved in .0001 seconds comparing RDTSC_val@ to 256 QueryPerformanceCounter "ticks")

I hope this demonstrates a way of easily identifying where programs use the most time, so where the best gains can be made.

John
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Wed Apr 04, 2018 8:44 am    Post subject: Reply with quote

Some further thoughts...

It is beginning to look like the calls to __pcheck may not be useful for ALLOCATABLE objects, in other words (as the documentation says) it should be for POINTERs only. This being the case, there are two things that need fixing: a) FTN95 where it plants calls to __pcheck and b) the extremely poor performance of __pcheck in clearwin64.dll.

The coding for /timing was provided by a third party (not Salford Software/Silverfrost) so the motivation behind the calibration loop can only be guessed. It may be that the loop is useful because of a) other background processes that are running and b) because of the unknown effects of multiple CPU cores. So I am not sure that simple clock timing would be appropriate. Then there is always the risk that we might break something that already works quite satisfactorily.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 933

PostPosted: Wed Apr 04, 2018 9:33 am    Post subject: Re: Reply with quote

PaulLaidler wrote:
... there are two things that need fixing: a) FTN95 where it plants calls to __pcheck and b) the extremely poor performance of __pcheck in clearwin64.dll.


With regard to Item a): FTN95 /check is doing fine in 32-bit mode; could the same strategy be followed with /64, assuming 64-bit versions of the necessary support routines are already present in clearwin64.dll?

Thanks.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Wed Apr 04, 2018 11:29 am    Post subject: Reply with quote

At the moment FTN95 uses the same "frontend" code in this context, independent of whether it is 32 bits or 64 bits.

The 64 bit implementation of __pcheck can be fixed to make it perform well. For safety, the FTN95 frontend will not be changed for 32 bits but it might be changed for 64 bits if the check is considered to be redundant (it might, for example, relate to an earlier Fortran standard where automatic deallocation was not in the standard).

An ALLOCATABLE object that is referenced after deallocating is trapped in a different way (as a result of it being nullified). So it may be that this check is redundant and only needed for Fortran POINTERs.

It's one of those decisions that could go either way but since it is early days for 64 bit FTN95 it might be wiser to leave it out until we can find a context where it would be useful.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1968
Location: Sydney

PostPosted: Wed Apr 04, 2018 1:15 pm    Post subject: Re: Reply with quote

PaulLaidler wrote:
and b) because of the unknown effects of multiple CPU cores. So I am not sure that simple clock timing would be appropriate.

Paul,

I am not sure what you mean by this comment.
Are you implying that /timing is thread safe ?
Or is it the uncertainty of which core is used for RDTSC ticks ?
My impression is that /timing is using RDTSC ticks as an accurate elapsed time allocation and it is not thread safe.
I thought /timing had two timing approaches; one worked in a similar way to my spot_time, which allocates accumulated ticks to each routine, (which would not be thread safe) and the other accumulates total time from entry to exit, accumulating clock time of other routines being called. This could be thread safe.

John
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Wed Apr 04, 2018 1:47 pm    Post subject: Reply with quote

John

I can't claim to understand why the calibration loop is applied rather than a simple clock test. The coding was written by someone like you who has a better understanding than I have of these issues.

At the moment it does not seem that it warrants committing resources to researching this and making changes.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Wed Apr 04, 2018 2:39 pm    Post subject: Reply with quote

mecej4

I notice that this program does not run with /UNDEF. I am impressed with the way that SDBG64 highlights the problem. 64 bit FTN95 seems to be doing better than 32 bit FTN95 in this case.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 933

PostPosted: Wed Apr 04, 2018 7:42 pm    Post subject: Reply with quote

Paul, I agree, and I look forward to the release of 8.30 Personal for that reason.

However, in the actual application (rather than the abbreviated version that I created for this post), one of the subprograms uses ENTRY, and that made it impossible for me to make much progress using /undef, because of the bug that I separately reported in http://forums.silverfrost.com/viewtopic.php?t=3743 .
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Thu Apr 05, 2018 9:26 am    Post subject: Reply with quote

This issue has now been fixed for the next release of clearwin64.dll.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group