/timing is a very effective FTN95 option for detecting slow performance, as it identifies the routines that have long run times and that call other routines. It requires no change to the code; only to the compile options.
My approach is to first use /timing on everything, then identify utility routines that may be called millions++ of times, these can be moved to another file that is not compiled with /timing. The first use involves no change to the code.
The use of spot_time (which is based on the RDTSC timer) is also useful for timing parts of the routine. It requires only a set of suitable calls to be placed in the code. It reports both seconds and processor cycles, which can be a useful measure to compare to expected operations counts.
This was my first use of /64 /check /timing, so was good to see this combination has been ported to 64-bit.
(Not sure why the 10 second calibration is still required, as you can use the program run time for that. I also posted test_timer_64.f90 which demonstrates reasonable calibration can be achieved in .0001 seconds comparing RDTSC_val@ to 256 QueryPerformanceCounter 'ticks')
I hope this demonstrates a way of easily identifying where programs use the most time, so where the best gains can be made.
John