Tags

There is a myth, or possibly a legend (there being something to it), that if you have a set of data reported to a particular precision, say whole degrees, then any average of those data should always be reported to that same precision and that precision only.

So, for example, it has been said that a monthly average temperature calculated from daily max and min temperatures that were measured to the nearest whole degree should be reported in whole degrees. As I said before, this is a myth (maybe a legend).

To bust (an hilariously dynamic word for what’s about to happen, or rather, barely happen) this myth, I’ve written a snippet of python code. It generates 100 random numbers drawn from a Gaussian distribution (mean of zero, standard deviation of one). It then rounds the same numbers to the nearest whole degree. It calculates the average of the full precision numbers and then it calculates the average of the rounded numbers. The difference between the two (the rounding error) is recorded. I do this ten thousand times.

We can predict what the standard deviation of these ten thousand trials will be using a simple uncertainty calculation. Rounding errors can be represented as a uniform distribution centred on the round numbers with a width of plus or minus 0.5 (if we are rounding to whole numbers which I assume we are here). The Guide to the Expression of Uncertainty in Measurement (Section 4.3.7) states that the variance of such a distribution of errors (which is a measure of the uncertainty associated with such errors) is equal to the width of the interval squared divided by twelve.

For rounding to whole numbers this is just 1/12 i.e. the uncertainty in a single rounded number is 0.288675 (square root of one twelfth). Because the rounding errors will be random and uncorrelated, the uncertainty in the average of 100 rounded numbers will therefore be the square root of 1/12 divided by the square root of the number of values we’re averaging, 100 in this case. That gives us an uncertainty of 0.0288675134595

The standard deviation of the 10000 trials is, last time I ran the code, 0.0287377673007. The predicted uncertainty matches the standard deviation of the error distribution well and we would therefore be justified in quoting the average to a higher precision than the individual values.

The question is, will this always be the case. No. There are some circumstances, for example when the variation between values are similar to, or much smaller than, the rounding increment, so your measurements do this: 22, 22, 22, 22, 22, 22… ad infinitum. However measured temperatures very rarely do this – diurnal temperature ranges over land are of order a few degrees to a few tens of degrees.

There are, no doubt, other situations where the simple case shown here doesn’t hold (if you have any, let me know in the comments), but the point is simply to demonstrate that there are situations in which it is acceptable to report an average to a higher, potentially much higher, precision than the original measurements.


import numpy as np
import random

trials = []

#calculate estimated uncertainty for average of 100 values
var = (2.0*0.5)**2 / 12.
estimated_uncertainty = np.sqrt(var/100.)

#do ten thousand trials
for j in range(0,10000):

#generate random numbers and then round them
samples = []
intsamples = []
for i in range(0,100):
rn = random.gauss(0,1)
samples.append(rn)
intsamples.append(float(round(rn)))

full_precision_average = np.mean(samples)
rounded_average = np.mean(intsamples)

trials.append(full_precision_average-rounded_average)

print np.std(trials),estimated_uncertainty

Advertisements