Uncertainty Estimates for Routine Temperature Data Sets Part Two.

Geoff Sherrington

Part One opened with 3 assertions.

It is generally agreed that the usefulness of measurement results, and thus much of the information that we provide as an institution, is to a large extent determined by the quality of the statements of uncertainty that accompany them.”

“The uncertainty in the result of a measurement generally consists of several components which may be grouped into two categories according to the way in which their numerical value is estimated:  A. those which are evaluated by statistical methods,  B. those which are evaluated by other means.”

“Dissent. Science benefits from dissent within the scientific community to sharpen ideas and thinking. Scientists’ ability to freely voice the legitimate disagreement that improves science should not be constrained. Transparency in sharing science. Transparency underpins the robust generation of knowledge and promotes accountability to the American public. Federal scientists should be able to speak freely, if they wish, about their unclassified research, including to members of the press.”

They led to a question that Australia’s Bureau of Meteorology, BOM, has been answering in stages for some years.

If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?

Part Two now addresses the more mathematical topics of the first two assertions.

In short, what is the proper magnitude of the uncertainty associated with such routine daily temperature measurements? (From here, the scope has widened from a single observation at a single station, to multiple years of observations at many stations globally.)

We start with where Part One ceased.

Dr David Jones emailed me on June 9, 2009 with this sentence:

“Your analogy between a 0.1C difference and a 0.1C/decade trend makes no sense either – the law of large numbers or central limit theorem tells you that random errors have a tiny effect on aggregated values.”

The Law of Large Numbers. LOLN, and the Central Limit Theorem. CLT, are often used to justify estimations of small measurement uncertainties. A general summary could be like “the uncertainty of a single measurement  might be +/-0.5⁰C for a single measurement, but if we take many measurements and average them, the uncertainty can become smaller.”

This thinking has to bear on the BOM table of uncertainty estimates shown in Part One and more below.

If the uncertainty of a single reading is indeed +/-0.5⁰C, then what mechanism is at work to reduce the uncertainty of multiple observations to lower numbers such as +/-0.19⁰C? It has to be almost total reliance on the CLT. If so, is this reliance justified?


Australia’s BOM authors have written a 38-page report that describes some of their relevant procedures. It is named “ITR 716 Temperature Measurement Uncertainty – Version 1.4_E” (Issued 30 March 2022). It might not yet be easily available in public literature.


The lengthy tables in that report need to be understood before proceeding.


(Start quote).

Sources of Uncertainty Information.

The process of identifying sources of uncertainty for near surface atmospheric temperature measurements was carried out in accordance with the International Vocabulary of Metrology [JCGM 200:2008]. This analysis of the measurement process established seven root causes and numerous contributing sources. These are described in Table 3 below. These sources  of uncertainty correlate with categories used in the uncertainty budget provided in Appendix D.

Table 3 – Ordinary Dry Bulb Thermometer and Air Temperature PRT Probe Uncertainty Contributors. Definitions in accordance with the International Vocabulary of Metrology [JCGM 200:2008]

click for full resolution version

Uncertainty Estimates

The overall uncertainty of the mercury in glass ordinary dry bulb thermometer and PRT probe to measure atmospheric temperature is given in Table 4. This table is a summary of the full measurement uncertainty budget given in Appendix D.

Table 4 – Summary table of uncertainties and degrees of freedom (DoF) [JCGM 100:2008] for ordinary dry bulb thermometer and electronic air temperature probes also referred to as PRT probes.

A detailed assessment of the estimate of least uncertainty for the ordinary dry bulb thermometer and air temperature probes is provided in Appendix D. This details the uncertainty contributors mentioned above in Table 3.

(End quote).


There are some known sources of uncertainty that are not covered, or perhaps not covered adequately, in these BOM tables. One of the largest sources is triggered by a change in the site of the screen. The screen has shown itself over time to be sensitive to disturbance such as site moves. The BOM, like many other keepers of temperature records, has engaged in homogenization exercises that the public has seen successively as “High Quality” data set that was discontinued, then ACORN-SAT versions 1, 2, 2.1 and 2.2.

The homogenization procedures are described several reports including:


The magnitude of changes due to site shifts are large compared to changes from the effects listed in the table above. They are also widespread. Few if any of the 112 or so official ACORN-SAT stations have escaped adjustment for this effect.

This table shows some daily adjustments for Alice Springs Tmin, with differences between raw and ACORN-SAT version 2.2 shown, all in ⁰C. Data are taken from


Date Min v2.2 Raw Raw minus v2.2
1944-07-20 -6.6 1.1 7.7
1943-12-15 18.5 26.1 7.6
1942-04-05 5.8 13.3 7.5
1942-08-22 8.1 15.6 7.5
1942-09-02 3.1 10.6 7.5
1942-09-18 13.4 20.6 7.2
1942-07-05 -2.8 4.4 7.2
1942-08-23 4.1 11.1 7.0
1943-12-12 19.8 26.7 6.9
1942-05-08 7.7 14.4 6.7
1942-02-15 18.4 25.0 6.6
1942-10-18 11.7 18.3 6.6
1943-07-05 2.3 8.9 6.6
1942-05-10 4 10.6 6.6
1942-09-24 5.1 11.7 6.6

These differences add to the uncertainty estimates being sought. The are different ways to do this, but BOM seems not to include them in the overall uncertainty. They are not measured differences, so they are not part of measurement uncertainty. They are estimates by expert staff, but nevertheless they need to find a place in overall uncertainty.

We question whether all or even enough sources of uncertainty have been considered. If the uncertainty was as small as is indicated, why would there be a need for adjustment, to produce data sets like ACORN-SAT? This was raised with BOM by this letter.


(Sent to Arla Duncan BOM Monday, 2 May 2022 5:42 PM )

Thank you for your letter and copy of BOM Instrument Test Report 716, “Near Surface Air Temperature Measurement Uncertainty V1.4_E.” via your email of 1st April, 2022.

This reply is in the spirit of seeking further clarification of my question asked some years ago,

If a person seeks to know the separation of two daily temperatures in degrees C that allows a confident claim that the two temperatures are different statistically, by how much would the two values be separated?

Your response has led me to the centre of the table in your letter, which suggested that historic daily temperatures typical of many would have uncertainties of the order of ±0.23 °C or ±0.18 °C.

I conducted the following exercise. A station was chosen, here Alice Springs airport BOM 15590 because of its importance to large regions in central Australia. I chose a year, 1953, more or less at random. I chose daily data for granularity. Temperature minima were examined. There is past data on the record for “Raw” from CDO, plus ACORN-SAT versions so far numbered 1, 2, 2.1 and 2.2; there is also the older High Quality BOM data set. With a day-by-day subtraction, I graphed their divergence from RAW. Here are the results.

It can be argued that I have chosen a particular example to show a particular effect, but this is not so. Many stations could be shown to have a similar daily range of temperatures.

This vertical range of temperatures at any given date can encompass a range of up to some 4 degrees C. In a rough sense, that can be equated to an uncertainty of +/- 2 deg C or more. This figure is an order of magnitude greater than your uncertainty estimates noted above.

Similarly, I chose another site, this time Bourke NSW, #48245 an AWS site, year 1999.

This example also shows a large daily range of temperatures, here roughly 2 deg C.

In a practical use, one can ask “What was the hottest day recorded in Bourke in 1993?

The answers are:

  • 43.2 +/- 0.13 ⁰C  from the High Quality and RAW data sets
  • 44.4  +/- 0.13 ⁰C from ACORN-SAT versions 2.1 and 2.2
  • or 44.7  +/- 0.13 ⁰C  from ACORN-SAT version 1

The results depend on the chosen data set and use the BOM estimates of accuracy for a liquid-in-glass thermometer and a data set of 100 years duration from the BOM Instrument Test Report 716 that was quoted in the table in your email.

This example represents a measurement absurdity.

There appears to be a mismatch of estimates of uncertainty. I have chosen to use past versions of ACORN-SAT and the old High Quality data set because each was created by experts attempting to reduce uncertainty.

How would BOM resolve this difference in uncertainty estimates?

(End of BOM letter)


BOM replied on 12 July 2022, extract follows:

“In response to your specific queries regarding temperature measurement uncertainty, it appears they arise from a misapplication of the ITR 716 Near Surface Air Temperature Measurement Uncertainty V1.4_E to your analysis. The measurement uncertainties in ITR 716 are a measurement error associated with the raw temperature data. In contrast, your analysis is a comparison between time series of raw temperature data and the different versions of ACORN-SAT data at particular sites in specific years. Irrespective of the station and year chosen, your analysis results from differences in methodology between the different ACORN-SAT datasets and as such cannot be compared with the published measurement errors in ITR 716. We recommend that you consider submitting the results of any further analysis to a scientific journal for peer review.”

(End of part of my reply).

BOM appears not to accept that site move effects should be included as part of uncertainty estimates. Maybe they should be. For example, in the start of the Australian Summer there are often news reports that claim a record new hot temperature has been reached on a particular day at some place. This is no easy task. See this material for the “hottest day ever in Australia.”

The point is that BOM are encouraging use of the ACORN-SAT data set as the official record, while often directing inquiries also to the raw data at one of their web sites. This means that a modern temperature like one today, from an Automatic weather Station with a Platinum Resistance Thermometer, can be compared with an early temperature after 1910 (when ACORN-SAT commences) taken by a Liquid in Glass Thermometer in a screen of different dimensions, shifted from its original site and adjusted by homogenisation.

Thus the comparison can be made between raw (today) and historic (early homogenised and moved).

Surely, that procedure is valid only of the effects of site moves are included in the uncertainty.


We return now to the topic of the Central Limit Theorem and the Law of Large Numbers, LOLN.

The CLT is partly described in the BIPM GUM here, a .jpg file to preserve equations:

These words are critical – “even if the distributions of the Xi are not normal, the distribution of Y may often be approximated by a normal distribution because of the Central Limit Theorem.” They might be the main basis for justifying uncertainty reductions statistically. What is the actual distribution of a group of temperatures from a time series?

These 4 histograms below were chosen to find if there was a signature of change from 2 years before Alice Springs weather station changed from Mercury-in-Glass to Platinum Resistance thermometry, compared to 2 years after the change of instrument. Upper pair (Tmax and Tmin) are before the change on 1 November 1996, lower pair are after the change. Both X and Y axes are scaled comparably.

In gross appearance, only one of these histograms visually approaches a normal distribution.

The principal question that arises is: “Does this disqualify the use of the CLT in the way that BOM seems to use it?”

Known mathematics can easily devolve these graphs into sub-sets of normal distributions (or near enough to). But that is an academic exercise, one that becomes useful only when the cause of each sub-set is identified and quantified. This quantification step, linked to a sub-distribution, is not the same as the categories of sources errors listed in the BOM tables above. I suspect that it is not proper to assume that their sub-distributions will match the sub-distributions derived from the histograms.

 In other words, it cannot be assumed carte-blanche that the CLT can be used unless measurements are made of the dominant factors contributing to uncertainty.

In seeking to conclude and summarise Part two, the main conclusions for me are:

  1. There are doubts whether the central Limit Theory is applicable to the temperatures of the type described.
  2. The BIPM Guide to Uncertainty might not be adequately applicable in the practical sense. It seems to be written more for controlled environments like national standards institutions where attempts are made to minimise and/or measure extraneous variables. It seems to have limits when the vagaries of the natural environment are thrust upon it. However, it is useful for discerning if authors of uncertainty estimates are following best practice. From the GUM:

The term “uncertainty”

The concept of uncertainty is discussed further in Clause 3 and Annex D.

 The word “uncertainty” means doubt, and thus in its broadest sense “uncertainty of measurement” means doubt about the validity of the result of a measurement. Because of the lack of different words for this general concept of uncertainty and the specific quantities that provide quantitative measures of the concept, for example, the standard deviation, it is necessary to use the word “uncertainty” in these two different senses.

 In this Guide, the word “uncertainty” without adjectives refers both to the general concept of uncertainty and to any or all quantitative measures of that concept.

  • There are grounds for adopting an overall uncertainty of at least +/- 0.5 degrees C for all historic temperature measurements when they are being used for comparison to each to another. It does not immediately follow that other common uses, such as in temperature/time series should use this uncertainty.

This article is long enough. I now plan a Part Three, that will mainly compare traditional statistical estimates of uncertainty with newer methos like “bootstrapping” that might well be suited for the task.


Geoff Sherrington


Melbourne, Australia.

30th August, 2022

via Watts Up With That?


September 6, 2022 at 08:46AM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s