forums.silverfrost.com

DrTip · Joined: 01 Aug 2006 Posts: 74 Location: Manchester

OK this isn't fortran as such but

it might be the sort of thing that numerics people have some sensible insight into

I am currently running some Monte Carlo runs of a model results are stored in a a SQL Server database

we have some stability issues which I am addressing however the question

I have the following subset of answers from the model which shoule be intransient :

351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6
351697468.6

looks ok yes? now if I use the excel stdevp function to work out a standard deviation of these I get an answer of

4.923076923

which is obviously incorrect since the answer by definition shoudl be less than the varaition of the least significant figure ie leass then 0.1 in above set

similarly if I use the sql built in function I get a similar answer, I can write my own program but since I would naively just do what I am sure excel and sql server will be doing ( this isn't hard core numerics after all) I was wondering if anyone had any experience of handling such data sets

where a small deviation in a large number is the interesting thing ( as is the case here)

the wider issue here is that once things have settled down I will be attempting to find statistics of sets of differences between two large numbers and I currently not totally up to data on all the pitfalls this can entail with floating point arithmetic.

Carl

now I guess this is a float point error from calculating the mean square of these large numbers then taking the mean squred from it ( as

JohnCampbell · Joined: 16 Feb 2006 Posts: 2618 Location: Sydney

Carl,

You must have pasted the wrong set of numbers, since thay are all the same. Two comments on what you may have been looking at.

First, why not first calculate the mean of the sample, then find the standard deviation of the (value - mean). This would overcome the floating point round off error.

Second, I see a problem with the design of your "experiment" if you require accuracy to more than 10 significant figures. To take the previous point, you probably should be reporting the difference in the result from an expected result, and not an apparent gross measure.
It's difficult to expect these sorts of required accuracies can lead to a significant change in the process being studied.

John

DrTip · Joined: 01 Aug 2006 Posts: 74 Location: Manchester

john
thank you for you comments. I pasted the correct set of numbers, that was the point! I knew for this test the standard deviation should be 0 (by inspection , I was just cutting and pasting the values in Excel)

I was trying to avoid doing a double pass std ie precalculate the mean since i think the data set might start getting out of hand in a few weeks time...

I have no idea apriori what the expected result should be ( probably not even a

what I have come up with is, is finding the difference between the data set and the first value in the set, since the std shoudl be invariant to this transformation, I then get the correct results ( or at least acceptable accuracy) a similar method with the mean can be used as well

anyway thnks again for your comments

Carl

Carl

JohnCampbell · Joined: 16 Feb 2006 Posts: 2618 Location: Sydney

Carl,

I don't mind having a 2 pass loop, although it requires storing all the sample values. If they are stored, then it is easy to sort them and calculate the median and percentile values for the distribution. The sum of x^2 can often reduce the precision of the calculation of standard deviation.

John

DrTip · Joined: 01 Aug 2006 Posts: 74 Location: Manchester

Thanks john

I think it is becoming appparent that storing the values may not be a bad thing after all, it looks liek we are going to be interested in the skews of the distributions as well. not to mention cross variances with other distributions.

All good stuff, though I must say I am doing this in the dark a fare bit. I come from a back ground where I woudl be doing stuff based on time series and this Monte Carlo Stuff seems to similar but not so close I don't keep asking stupid questions!

Anyway thanks again for your comments

Carl