Tendency, Convenient Mistakes, and the Importance of Physical Reasoning.

Last February 7, statistician Richard Booth, Ph.D. (hereinafter, Rich) posted a very long critique titled, What do you mean by “mean”: an essay on black boxes, emulators, and uncertainty” which is very critical of the GCM air temperature projection emulator in my paper. He was also very critical of the notion of predictive uncertainty itself.

This post critically assesses his criticism.

An aside before the main topic. In his critique, Rich made many of the same mistakes in physical error analysis as do climate modelers. I have described the incompetence of that guild at WUWT here and here.

Rich and climate modelers both describe the probability distribution of the output of a model of unknown physical competence and accuracy, as being identical to physical error and predictive reliability.

Their view is wrong.

Unknown physical competence and accuracy describes the current state of climate models (at least until recently. See also Anagnostopoulos, et al. (2010), Lindzen & Choi (2011), Zanchettin, et al., (2017), and Loehle, (2018)).

GCM climate hindcasts are not tests of accuracy, because GCMs are tuned to reproduce hindcast targets. For example, here, here, and here. Tests of GCMs against a past climate that they were tuned to reproduce is no indication of physical competence.

When a model is of unknown competence in physical accuracy, the statistical dispersion of its projective output cannot be a measure of physical error or of predictive reliability.

Ignorance of this problem entails the very basic scientific mistake that climate modelers evidently strongly embrace and that appears repeatedly in Rich’s essay. It reduces both contemporary climate modeling and Rich’s essay to scientific vacancy.

The correspondence of Rich’s work with that of climate modelers reiterates something I realized after much immersion in published climatology literature — that climate modeling is an exercise in statistical speculation. Papers on climate modeling are almost entirely statistical conjectures. Climate modeling plays with physical parameters but is not a branch of physics.

I believe this circumstance refutes the American Statistical Society’s statement that more statisticians should enter climatology. Climatology doesn’t need more statisticians because it already has far too many: the climate modelers who pretend at science. Consensus climatologists play at scienceness and can’t discern the difference between that and the real thing.

Climatology needs more scientists. Evidence suggests many of the good ones previously resident have been caused to flee.

Rich’s essay ran to 16 typescript pages and nearly 7000 words. My reply is even longer — 28 pages and nearly 9000 words. Followed by an 1800-word Appendix.

For those disinclined to go through the Full Tilt Boogie below, here is a short precis followed by a longer summary.

The very short take-home message: Rich’s entire analysis has no critical force.

A summary list of its problems:

1. Rich’s analysis shows no evidence of physical reasoning.

2. His proposed emulator is constitutively inapt and tendentious.

3. Its derivation is mathematically incoherent.

4. The derivation is dimensionally unsound, abuses operator algebra, and deploys unjustified assumptions.

5. Offsetting calibration errors are incorrectly and invariably claimed to promote predictive reliability.

6. The Stefan-Boltzmann equation is inverted.

7. Operators are improperly treated as coefficients.

8. Accuracy is repeatedly abused and ejected in favor of precision.

9. The GCM air temperature projection emulator (paper eqn. 1) is fatally confused with the error propagator (paper eqn. 5.2)

10. The analytical focus of my paper is fatally misconstrued to be model means.

11. The GCM air temperature projection emulator is wrongly described as used to fit GCM air temperature means.

12. The same emulator is falsely portrayed as unable to emulate GCM projection variability, despite 68 examples to the contrary.

13. A double irony is that Rich touted a superior emulator without ever displaying a single successful emulation of a GCM air temperature projection.

14. Assumed away all the difficulties of measurement error or model error (qualifying Rich to be a consensus climatologist).

15. Uncertainty statistics are wrongly and invariably asserted to be physical error or an interval of physical error.

16. Systematic error is falsely asserted as restricted to a fixed constant bias offset.

17. Uncertainty in temperature is falsely and invariably construed to be an actual physical temperature.

18. Empirically unjustified invariably ad hoc assumptions of error as a random variable.

19. The JCGM description of standard uncertainty variance is self-advantageously misconstrued.

20. The described use of rulers or thermometers is unrealistic.

21. Readers are advised to record and accept false precision.

A couple of preliminary instances that highlight the difference between statistical thinking and physical reasoning.

Rich wrote that, “It may be objected that reality is not statistical, because it has a particular measured value. But that is only true after the fact, or as they say in the trade, a posteriori. Beforehand, a priori, reality is a statistical distribution of a random variable, whether the quantity be the landing face of the die I am about to throw or the global HadCRUT4 anomaly averaged across 2020.”

Rich’s description of an a priori random variable status for some as-yet unmeasured state is wrong when the state of interest, though itself unknown, falls within a regime treated by physical theory, such as air temperature. Then the a priori meaning is not the statistical distribution of a random variable, but rather the unknown state of a deterministic system that includes uncontrolled but explicable physical effects.

Rich’s comment implied that a new aspect of physical reality is approached inductively, without any prior explanatory context. Science approaches a new aspect of physical reality deductively from a pre-existent physical theory. The prior explanatory context is always present. This inductive/deductive distinction marks a fundamental departure in modes of thinking. The first neither recognizes nor employs physical reasoning. The second does both.

Rich also wrote, “It may also be objected that many black boxes, for example Global Circulation Models, are not statistical, because they follow a time evolution with deterministic physical equations. Nevertheless, the evolution depends on the initial state, and because climate is famously “chaotic”, tiny perturbations to that state, lead to sizeable divergence later. The chaotic system tends to revolve around a small number of attractors, and the breadth of orbits around each attractor can be studied by computer and matched to statistical distributions.”

But this is not known to be true. On the one hand, an adequate physical theory of the climate is not available. This lack leaves GCMs as parameterized engineering models. They are capable only of statistical arrays of outputs. Arguing the centrality of statistics to climate models as a matter of principle begs the question of theory.

On the other hand, supposing a small number of attractors flies into the face of the known large number of disparate climate states spanning the entire variation between “snowball Earth” and hot house Earth. And supposing those states can be studied by computer and expressed as statistical distributions again begs the question of physical theory. Lots of hand-waving, in other words.

Rich went on to write that the problem of climate could be approached as “a probability distribution of a continuous real variable.” But this assumes the behavior of the physical system as smoothly continuous. The many Dansgaard- Oeschger and Heinrich events are abrupt and discontinuous shifts of the terrestrial climate.

None of Rich’s statistical conjectures are constrained by known physics or by the behavior of physical reality. In other words, they display no evidence of physical reasoning.

The Full-Tilt Boogie.

In his Section B, Rich set up his analysis by defining three sources of result:

1. physical reality ® X(t) (data)

2. black box model ® M(t) (simulation of the X(t)-producing physical reality)

3. model emulator ® W(t) (emulation of model M output)

I. Problems with “Black Box and Emulator Theory” Section B:

Rich’s model emulator W is composed to, “estimate of the past black box values and to predict the black box output.” That is, his emulator targets model output. It does not emulate the internal behavior or workings of the full model in some simpler way.

Its formal structure is given by his first equation:

W(t) = (1-a)W(t-1) + R₁(t) + R₂(t) + (-r)R₃(t), (1ʀ)

where W(t-1) is some initial value and W(t) is the final value after integer time-step ‘t.’ The equation number subscript “ʀ” designates Rich as the source.

As an aside here, it is not unfair to notice that despite its many manifestations and modalities, Rich’s superior GCM emulator is never once used to actually emulate an air temperature projection.

The eqn. 1ʀ emulator manifests persistence, which the GCM projection emulator in my paper does not. Rich began his analysis, then, with an analogical inconformity.

The factors in eqn. 1ʀ are described as: “R₁(t) is to be the component which represents changes in major causal influences, such as the sun and carbon dioxide. R₂(t) is to be a component which represents a strong contribution with observably high variance, for example the Longwave Cloud Forcing (LCF). … R₃(t) is a putative component which is negatively correlated with R₂(t) with coefficient -r, with the potential (dependent on exact parameters) to mitigate the high variance of R₂(t).”

Emulator coefficient ‘r’ is always negative. The R₃(t) itself is negatively correlated with R₂(t) so that R₃(t) offsets (reduces) the magnitude of R₂(t), and 0 £ a £ 1. The Rn(t) are defined as time-dependent random variables that add into (1-a)W(t-1).

The relative impact of each Rn on W(t-1) is R₁(t) > R₂(t) ³ |rR₃(t)|.

A problem with factor R₃(t):

The R₃(t) is given to be “negatively correlated” with R₂(t), “to mitigate the high variance of R₂(t).” However, factor R₃(t) is also multiplied by coefficient -r.

“Negatively correlated” refers to R₃(t). The ‘-r’ is an additional and separate conditional.

There are three cases governing the meaning of ‘negative correlation’ for R₃(t).

1) R₃(t) starts at zero and becomes increasingly negative as R₂(t) becomes increasingly positive.

2) R₃(t) starts positive and becomes smaller as R₂(t) becomes large, but remains greater than zero.

3) R₃(t) starts positive and becomes small as R₂(t) becomes large but can pass through zero into negative values.

If 1), then -rR₃(t) is positive and has the invariable effect of increasing R₂(t) — the opposite of what was intended.

If 2), then -rR₃(t) has a diminishing effect on R₂(t) as R₂(t) becomes larger — again opposite the desired effect.

If 3), then -rR₃(t) diminishes R₂(t) at low but increasing values of R₂(t), but increases R₂(t) as R₂(t) becomes large and R₃(t) passes into negative values. This is because -r(-R₃(t)) = rR₃(t). That is, the effect of R₃(t) on R₂(t) is concave upwards around zero, è₀ø.

That is, none of the combinations of -r and negatively correlated R₃(t) has the desired effect on R₂(t). A consistently diminishing effect on R₂(t) is frustrated.

With negative coefficient -r, the R₃(t) term must be greater than zero and positively correlated with R₂(t) to diminish the contribution of R₂(t) at high values.

Curiously, Rich did not designate what X(t) actually is (perhaps air temperature?).

Nor did he describe what process the model M(t) simulates, nor what the emulator W(t) emulates. Rich’s emulator equation (1ʀ) is therefore completely arbitrary. It’s merely a formal construct that he likes, but is lacking any topical relevance or analytical focus.

In strict contrast, my interest in emulation of GCMs was roused when I discovered in 2006 that GCM air temperature projections are linear extrapolations of GHG forcing. In December 2006, John A publicly posted that finding at Steve McIntyre’s Climate Audit site, here.

That is, I began my work after discovering evidence about the behavior of GCMs. Rich, on the other hand, launched his work after seeing my work and then inventing an emulator formalism without any empirical referent.

Lack of focus or relevance makes Rich’s emulator irrelevant to the GCM air temperature emulator in my paper, which was derived with direct reference to the observed behavior of GCMs.

I will show that the irrelevance remains true even after Rich, in his Section D, added my numbers to his invented emulator.

A Diversion into Dimensional Analysis:

Emulator 1ʀ is a sum. If, for example, W(t) represents one value of an emulated air temperature projection, then the units of W(t) must be, e.g., Celsius (C). Likewise, then, the dimensions of W(t-1), R₁(t), R₂(t), and -rR₃(t), must all be in units of C. Coefficients a and r must be dimensionless.

In his exposition, Rich designated his system as a time series, with t = time. However, his usage of ‘t’ is not uniform, and most often designates the integer step of the series. For example, ‘t’ is an integer in W(t-1) in equation 1ʀ, where it represents the time step prior to W(t).

Continuing:

From eqn. (1ʀ), for a time series i = 1®t and when W(t-1) = W(0) = constant, Rich presented his emulator generalization as:

clip_image001 (2ʀ)

Let’s see if that is correct. From eqn. 1ʀ:

W(t1) = (1-a)W(0) + R₁(t1) + R₂(t1) -rR₃(t1), 1ʀ1

where the subscript on t indicates the integer step number.

W(t2) = (1-a)W(t1) + R₁(t2) + R₂(t2) -rR₃(t2) 1ʀ2

Substituting W(t1) into W(t2),

W(t2) = (1-a)[(1-a)W(0)+ R₁(t1) + R₂(t1) – rR₃(t1)] +[R₁(t2) + R₂(t2)-rR₃(t2)]

= (1-a)²W(0)+(1-a)[R₁(t1) + R₂(t1) – rR₃(t1)] + (1-a)⁰[R₁(t2)) + R₂(t2) -rR₃(t2)]

(NB: (1-a)⁰ = 1, and is added for completion)

Likewise, W(t3)=(1-a){[(1-a)²W(0)+(1-a)[R₁(t1)+R2(t1)-rR3(t1)]+[R1(t2)+R₂(t2)-rR₃(t2)]} + (1-a)⁰[R₁(t3) + R₂(t3) -rR₃(t3)]

= (1-a)³W(0)+(1-a)²[(R₁(t1)+R2(t1)-rR3(t1)]+(1-a)[R1(t2)+R₂(t2)-rR₃(t2)]+(1-a)^⁰[R₁(t3)+R₂(t3)-rR₃(t3)]

Generalizing:

clip_image003 (1)

Compare eqn. (1) to eqn. (2ʀ). They are not identical.

In generalized equation 1, when i = t = 1, W(tt) goes to W(t₁) = (1-a)W(0) + R₁(t1) + R₂(t1) -rR₃(t1) as it should do.

However, Rich’s equation 2ʀ does not go to W(t₁) in the limiting case i = t = 1.

Instead 2ʀ becomes W(t₁) = (1-a)W(0)+(1-a)[R₁(0)+R₂(0) -rR₃(0)], which is not correct.

The R-factors should have their t₁ values, but do not. There are no Rn(0)’s because W(0) is an initial value that has no perturbations. Also, coefficient (1-a) should not multiply the Rn’s (look at eqn. 1ʀ).

So, equation 2ʀ is wrong. The 1ʀ®2ʀ transition is mathematically incoherent.

There’s a further conundrum. Rich’s derivation, and mine, assume that coefficient ‘a’ is constant. If ‘a’ is constant, then (1-a) becomes raised to the power of the summation e.g., (1-a)ᵗW(0).

But there is no reason to think that coefficient ‘a’ should be a constant across a time-varying system. Why should every new W(t-1) have a constant fractional influence on W(t)?

Why should ‘a’ be constant? Apart from convenience.

Rich then defined E[] = expectation value and V[] = variance = (standard deviation)², and assigned that:

E[R₁(t)] = bt+c

E[R₂(t)] = d

E[R₃(t)] = 0.

Following this, Rich allowed (leaving the derivation to the student) that, “Then a modicum of algebra derives

“E[W(t)] = b(at + a-1 + (1-a)^t+1)/a² + (c+d)(1 – (1-a)^t)/a + (1-a)W(0)” (3ʀ)

Evidently 3ʀ was obtained by manipulating 2ʀ (can we see the work, please?). But as 2ʀ is incorrect, nothing worthwhile is learned. We’re told that eqn. 3ʀ ® 4ʀ as coefficient ‘a’ ® 0.

E[W(t)] = bt(t+1)/2 + (c+d)t + W(0) (4ʀ)

A Second Diversion into Dimensional Analysis:

Rich assigned E[R₁(t)] = bt+c. Up through eqn. 2ʀ, ‘t’ was integer time. In E[R₁(t)] it has become a coefficient. We know from eqn. 1ʀ that R₁(t) must have the identical dimensional unit carried by W(t), which is, e.g., Celsius.

We also know R₁(t) is in Wm⁻², but W(t) is in Celsius (C). Factor “bt” must be in the same Celsius units as [W(t)]. Is the dimension of b, then, Celsius/time? How does that work? The dimension of ‘c’ must also be Celsius. What is the rationale of these assignments?

The assigned E[R₁(t)] = bt+c has the formula of an ascending straight line of intercept c, slope b, and time the abscissa.

How convenient it is, to assume a linear behavior for the black box M(t) and to assign that linearity before ever (supposedly) considering the appropriate form of a GCM air temperature emulator. What rationale determined that convenient form? Apart from opportunism?

The definition of R₁(t) was, “…the component which represents changes in major causal influences, such as the sun and carbon dioxide.”

So, a straight line now represents the major causal influence of the sun or of CO2. How was that decided?

Next, multiplying through term 1 in 4ʀ, we get bt(t+1)/2 = (bt²+bt)/2. How do both bt² and bt have the units of Celsius required by E[R₁(t)] and W(0)?

Factor ‘t’ is in units of time. The internal dimensions of bt(t+1)/2 are incommensurate. The parenthetical sum is physically meaningless.

Continuing:

Rich’s final equation for the total variance of his emulator,

Var[W(t)] = (s₁²+s₂²+s₃²-2r s₂s₃)(1 – (1-a)^2t)/(2a-a²) (5ʀ)

included all the Rn(t) terms and the assumed covariance of his R₂(t) and R₃(t).

Compare his emulator 4ʀ with the GCM air temperature projection emulator in my paper:

(2)

In contrast to Rich’s emulator, eqn. 2 has no offsetting covariances. Not only that, all the DT-determining coefficients in eqn. 2 except fCO₂ are givens. They have no uncertainty variance at all.

In short, both Rich’s emulator itself and its dependent variances are utterly irrelevant to any evaluation of the GCM projection emulator (eqn. 2). Utterly irrelevant, even if they were correctly derived, which they were not.

Parenthetical summary comments on Rich’s “Summary of section B:”

“A good emulator can mimic the output of the black box.“

(Trivially true.)

“A fairly general iterative emulator model (1) is presented.”

(Never once used to actually emulate anything, and of no focused relevance to GCM air temperature projections.)

“Formulae are given for expectation and variance of the emulator as a function of time t and various parameters.“

(An emulator that is critically vacant and mathematically incoherent, and with an inapposite variance.)

“The 2 extra parameters, a, and R₃(t), over and above those of Pat Frank’s emulator, can make a huge difference to the evolution.”

(Extra parameters in an emulator that does not deploy the formal structure of the GCM emulator, and missing any analytically equivalent factors. The extra parameters are ad hoc, while ‘a’ is incorrectly specified in 3ʀ and 4ʀ. The emulator is critically irrelevant and its expansion in ‘a’ is wrong.)

“The “magic” component R₃(t) with anti-correlation -r to R₂(t) can greatly reduce model error variance whilst retaining linear growth in the absence of decay.“

(Component R₃(t) is likewise ad hoc. It has no justified rationale. That R₃(t) has a variance at all requires its rejection (likewise rejection of R₁(t) and R₂(t)) because the coefficients in the emulator in the paper (eqn. 2 above) have no associated uncertainties.)

“Any decay rate a>0 completely changes the propagation of error variance from linear growth to convergence to a finite limit.“

(The behavior of a critically irrelevant emulator engenders a deserved, ‘so what?’

Further, a>0 causes general decay only by allowing the mistaken derivation that put the (1-a) coefficient into the Rn(t) factors in 2ʀ)

Section I conclusion: The emulator construction itself is incongruous. It includes an unwarranted persistence. It has terms of convenience that do not map onto the target GCM projection emulator. The -rR₃(t) term cannot behave as described.

The transition from eqn. 2ʀ to eqn. 3ʀ is mathematically incoherent. The derivations following that employ eqn. 3ʀ are therefore wrong, including the variances.

The eqn. 1ʀ emulator itself is ad hoc. Its derivation is without reference to the behavior of climate models and of physical reasoning. Its ability to emulate a GCM air temperature projection is undemonstrated.

II. Problems with “New Parameters” Section C:

Rich rationalized his introduction of so-called decay parameter ‘a’ in the “Parameter” section C of his post. He introduced this equation:

M(t) = b + cF(t) +dH(t-1), 6ʀ

where M = temperature, F = forcing, and H(t) is “heat content“.

The ‘b’ term might be the ‘b’ coefficient assigned to E(R₁(t)] above, but we are not told anything about it.

I’ll summarize the problem. Coefficients ‘c’ and ‘d’ are actually functions that transform forcing and heat flux (not heat content) in Wm⁻², into their respectively caused temperature, Celsius. They are not integers or real numbers.

However, Rich’s derivation treats them as real number coefficients. This is a fatal problem.

For example, in equation 6ʀ above, function ‘d’ transforms heat flux H(t-1) into its consequent temperature, Celsius. However, the final equation of Rich’s algebraic manipulation ends with ‘d’ inappropriately operating on M(0), the initial temperature. Thus, he wrote:

“M(t) = b + cF(t) + d(H(0) + e(M(t-1)-M(0)) = f + cF(t) + (1-a)M(t-1) (7ʀ)

where a = 1-de, f = b+dH(0)-deM(0). (my bold)”

There is no physical justification for a “deM(0)” term; d cannot operate on M(0).

Rich also assigned “a = 1-de,” where ‘e’ is an integer fraction, but again, ‘d’ is an operator function; ‘d’ cannot operate on ‘e’. The final (1-a)M(t-1) term is a cryptic version of deM((t-1), which contains the same fatal assault on physical meaning. Function ‘d’ cannot operate on temperature M.

Further, what is the meaning of an operator function standing alone with nothing on which to operate? How can “1-de” be said to have a discrete value, or even to mean anything at all?

Other conceptual problems are in evidence. We read, “Now by the Stefan-Boltzmann equation M [temperature – P] should be related to F^¼ …” Rather, S-B says that M should be related to H^¼ (H is here taken to be black body radiant flux). According to climate models M is linearly related to F.

We are also told, “Next, the heat changes by an amount dependent on the change in temperature: …” while instead, physics says the opposite: temperature changes by an amount dependent on the change in the heat (kinetic energy). That is, temperature is dependent on atomic/molecular kinetic energy.

Rich finished with, “Roy Spencer, who has serious scientific credentials, had written “CMIP5 models do NOT have significant global energy imbalances causing spurious temperature trends because any model systematic biases in (say) clouds are cancelled out by other model biases”. .”

Roy’s comment was originally part of his attempted disproof of my uncertainty analysis. It completely missed the point, in part because it confused physical error with uncertainty.

Roy’s and Rich’s offsetting errors do nothing to remove uncertainty from the prediction of a physical model.

Rich went on, “This means that in order to maintain approximate Top Of Atmosphere (TOA) radiative balance, some approximate cancellation is forced, which is equivalent to there being an R₃(t) with high anti-correlation to R₂(t). The scientific implications of this are discussed further in Section I.”

The only, repeat only, scientific implication of offsetting errors is that they reveal areas requiring further research, that the theory is inadequate, and that the predictive capacity is poor.

Rich’s approving mention of Roy’s mistake evidences that Rich, too, apparently does not see the distinction between physical error and predictive uncertainty. Tim Gorman especially, and others, have repeatedly pointed out the distinction to Rich, e.g., here, here, here, here, and here, but to no obvious avail.

Conclusions regarding the Parameter section C: analytically impossible, physically disjointed, wrongly supposes offsetting errors increase predictive reliability, wrongly conflates physical error with predictive uncertainty.

And once again, no demonstration that the proposed emulator can emulate anything relevant.

III. Problems with “Emulator Parameters” Section D:

In Section I above, I promised to show that Rich’s emulator would remain irrelevant, even after he added my numbers to it.

In his “Emulator Parameters” section Rich started out with, “Dr. Pat Frank’s emulator falls within the general model above.” This view could not possibly be more wrong.

First, Rich composed his emulator with my GCM air temperature projection emulator in mind. He inverted significance to say the originating formalism falls within the limit of a derivative composition.

Again, the GCM projection emulator is:

(2 again)

Rich’s emulator is W(t) = (1-a)W(t-1) + R₁(t) + R₂(t) + (-r)R₃(t) (1ʀ again)

(In II above, I showed that his alternative, M(t) = f + cF(t) + (1-a)M(t-1), is incoherent and therefore not worth considering further.)

In Rich’s emulator, temperature T₂ has some persistence from T₁. This dependence is nowhere in the GCM projection emulator.

Further, in the GCM emulator (eqn. 2-again), the temperature of time t-1 makes no appearance at all in the emulated air temperature at time t. Rich’s 1ʀ emulator is constitutionally distinct from the GCM projection emulator. Equating them is to make a category mistake.

Analyzing further, emulator R₁(t) is a, “component which represents changes in major causal influences, such as the sun and carbon dioxide,”

Rich’s R₁(t) describes all of, in the GCM projection emulator.

Rich’s R₁(t) thus exhausts the entire GCM projection emulator. What then is the purpose of his R₂(t) and R₃(t)? They have no analogy in the GCM projection emulator. They have no role to transfer into meaning.

The R₂(t) is “a strong contribution with observably high variance, for example the Longwave Cloud Forcing (LCF).” The GCM projection emulator has no such term.

The R₃(t) is, “a putative component which is negatively correlated with R₂(t)…” The GCM projection emulator has no such term. R₃(t) has no role to play in any analytical analogy.

Someone might insist that Rich’s emulator is like the GCM projection emulator after his (1-a)W(t-1), R₂(t), and (-r)R₃(t) terms are thrown out.

So, we’re left with this deep generalization: Rich’s emulator-emulator pared to its analogical essentials is M(tᵢ) = R(tᵢ),

where R(tᵢ) =.

Rich went on to specify the parameters of his emulator: “The constants from [Pat Frank’s] paper, 33K, 0.42, 33.3 Wm^-2, and +/-4 Wm^-2, the latter being from errors in LCF, combine to give 33*0.42/33.3 = 0.416 and 0.416*4 = 1.664 used here.”

Does anyone see a ±4 Wm⁻² in the GCM projection emulator? There is no such term.

Rich has made the same mistake as did Roy Spencer (one of many). He supposed that the uncertainty propagator (the right-side term in paper eqn. 5.2) is the GCM projection emulator.

It isn’t.

Rich then presented the conversion of his general emulator into his view of the GCM projection emulator: “So we can choose a = 0, b = 0, c+d = 0.416 F(t) where F(t) is the new GHG forcing (Wm^-2) in period t, s₁=0, s₂=1.664, s₃=0.“, and then derive

W(t) = (c+d)t + W(0) +/- sqrt(t) s₂” (8ʀ)

There are Rich’s mistakes made explicit: his emulator, eqn. 8ʀ, includes persistence in the W(0) term and a ±sqrt(t) s2 term, neither of which appear anywhere in the GCM projection emulator. How can eqn. 8ʀ possibly be an analogy for eqn. 2?

Further, including “ +/- sqrt(t) s₂“will cause his emulator to produce two values of W(t) at every time-step.

One value W(t) stems from the positive root of sqrt(s2) and the other W(t) from the negative root of sqrt(s2). A plot of the results will show two W(t) trends, one perhaps rising while the other falls.

To see this mistake in action, see the first Figure in Roy Spencer’s critique.

The “+/-” term in Rich’s emulator makes it not an emulator.

However, a ‘±’ term does appear in the error propagator:

±uᵢ(T) = [fCO₂ ´ 33 K ´ (±4 Wm⁻²)/F₀] — see eqns. 5.1 and 5.2 in the paper.

It should now be is obvious that Rich’s emulator is nothing like the GCM projection emulator.

Instead, it represents a category mistake. It is not only wrongly derived, it has no analytical relevance at all. It is conceptually adverse to the GCM projection emulator it was composed to critically appraise.

Rich’s emulator is ad hoc. It was constructed with factors he deemed suitable, but with no empirical reference. Theory without empiricism is philosophy at best; never science.

Rich then added in certain values taken from the GCM projection emulator and proceeded to zero out everything else in his equation. The result does not demonstrate equivalence. It demonstrates tendentiousness: elements manipulated to achieve a predetermined end. This approach is diametrical to actual science.

The rest of the Emulator Parameters section elaborates speculative constructs of supposed variances given Rich’s irrelevant emulator. For example, “Now if we choose b = a(c+d) then that becomes (c+d)(t+1), etc. etc.” This is to choose without any reference to any explicit system or any known physical GCM error. The b = a(c+d) is an ungrounded levitated term. It has no substantive basis.

The rest of the variance speculation is equally irrelevant, and in any case derives from an unreservedly wrong emulator.

Nowhere is its competence demonstrated by, e.g., emulating a GCM air temperature projection.

I will not consider his Section D further, except to note that Rich’s Case 1 and Case 2 clearly imply that he considers the variation of model runs about the model projection mean to be the centrally germane measure of uncertainty.

It is not.

The precision/accuracy distinction was discussed in the introductory comments above. Run variation supplies information only about model precision — run repeatability. The analysis in the paper concerned accuracy.

This distinction is absolutely central, and was of immediate focus.

Introduction paragraph 2:

“Published GCM projections of the GASAT typically present uncertainties as model variability relative to an ensemble mean (Stainforth et al., 2005; Smith et al., 2007; Knutti et al., 2008), or as the outcome of parameter sensitivity tests (Mu et al., 2004; Murphy et al., 2004), or as Taylor diagrams exhibiting the spread of model realizations around observations (Covey et al., 2003; Gleckler et al., 2008; Jiang et al., 2012). The former two are measures of precision, while observation-based errors indicate physical accuracy. Precision is defined as agreement within or between model simulations, while accuracy is agreement between models and external observables (Eisenhart, 1963, 1968; ISO/IEC, 2008). (bold added)

…

“However, projections of future air temperatures are invariably published without including any physically valid error bars to represent uncertainty. Instead, the standard uncertainties derive from variability about a model mean, which is only a measure of precision. Precision alone does not indicate accuracy, nor is it a measure of physical or predictive reliability. (added bold)

“The missing reliability analysis of GCM global air temperature projections is rectified herein.”

It is evidently possible to read the above and fail to grasp it. Rich’s entire approach to error and variance ignores it and thereby is misguided. He has repeatedly confused model precision with predictive accuracy.

That mistake is fatal to critical relevance. It removes any valid application of Rich’s critique to my work or to the GCM projection emulator.

Finally, I will comment on his last paragraph: “Pat Frank’s paper effectively uses a particular W(t;u) (see Equation (8) above) which has fitted m_w(t;u) to m_m(t), but ignores the variance comparison. That is, s₂ in (8) was chosen from an error term from LCF without regard to the actual variance of the black box output M(t).”

The first sentence says that I fitted “m_w(t;u) to m_m(t).” That is, Rich supposed that my analysis consisted of fits to the model mean.

He is wrong. The analysis focused on single projection runs of individual models.

Methodological SI Figure S3-2 shows each fit tested a single temperature projection run of a single target GCM plotted against a standard of GHG forcing (SRES, Meinshausen, or other).

SI Figure S3-2. Left: fit of cccma_cgcm3_1_t63 projected global average temperature plotted vs SRES A2 forcing. Right: emulation of the ccma_cgcm3_1_t63 A2 air temperature projection. Every fit had only one important degree of freedom.

Only Figure 7 showed emulation of a multi-model projection mean. All the 68 rest of them were single model projection runs. All of which Rich apparently missed.

There is no ambiguity in what I did, which is not what Rich supposed I did.

The second sentence, “That is, s₂ in (8) was chosen from an error term from LCF without regard to the actual variance of the black box output M(t).” is also factually wrong. Twice.

First, there is no LCF term in the emulator, nor any standard deviation. The “s2” is a fantasy.

Second, the long wave cloud forcing calibration error in the uncertainty propagator is the annual average error CMIP5 GCMs make in simulating annual global cloud fraction (CF).

That is, LWCF calibration error is exactly the actual [error] variance of the black box output M(t) with respect to observed global cloud fraction.

Rich’s “the actual variance of the black box output M(t).” refers to the variance of individual GCM air temperature projection runs around a projection mean; a precision metric.

The accuracy metric of model variance with respect to observation is evidently lost on Rich. He brought up inattention to bare precision as though it faulted an analysis concerned with accuracy.

This fatal mistake is a commonplace among the critics of my paper.

It shows a foundational inability to effectuate any scientifically valid criticism at all.

The explanation for entering the LWCF error statistic into the uncertainty propagator is given within the paper (p. 10):

“GHG forcing enters into and becomes part of the global tropospheric thermal flux. Therefore, any uncertainty in simulated global tropospheric thermal flux, such as LWCF error, must condition the resolution limit of any simulated thermal effect arising from changes in GHG forcing, including global air temperature. LWCF calibration error can thus be combined with 1Fi in equation 1 to estimate the impact of the uncertainty in tropospheric thermal energy flux on the reliability of projected global air temperatures.”

This explanation seems opaque to many for reasons that remain obscure.

Citations Zhang et al. (2005) and Dolinar et al. (2015) gave similar estimates of LWCF calibration error.

Summary conclusions about the Emulator Parameter Section D:

1) The proposed emulator is ad hoc and tendentious.

2) The proposed emulator is constitutively wrong.

· It wrongly includes persistence.

· It wrongly includes a cloud forcing term (or the like).

· It wrongly includes an uncertainty statistic.

3) Confused the uncertainty propagator with the GCM projection emulator.

4) Mistaken focus of precision in a study about accuracy.

Or, perhaps, ignorance of the concept of physical accuracy, itself.

5) Wrongly imputed that the study focused on GCM projection means.

6) Never once demonstrated that the emulator can actually emulate.

IV. Problems with “Error and Uncertainty” Section E:

Thus far, we’ve found that Rich’s emulator analysis is ad hoc, tendentious, constitutively wrong, dimensionally impossible, mathematically incoherent, confuses precision with accuracy, includes incongruous variances, and is empirically unvalidated. His analysis could almost not be more bolloxed up.

I here step through a few of his Section E mistakes, which always seem to simplify things for him. Quotes are marked “R:” followed by a comment.

R: “Assuming that X is a single fixed value, then prior to measurement, M-X is a random variable representing the error,…”

Except when the error is systematic stemming from uncontrolled variables. In that case M – X is a deterministic variable of no fixed mean, of a non-normal dispersion, and of an unknowable value. See the further analysis in the Appendix.

R: “+/-s_m is described by the JCGM 2.3.1 as the “standard” uncertainty parameter.”

Rich is being a bit fast here. He’s implying the JCGM Section 2.3.1 definition of “standard uncertainty” is limited to the SD of random errors.

The JCGM is the Evaluation of measurement data — Guide to the expression of uncertainty in measurement — the standard guide to the statistical analysis of measurements and their errors provided by the Bureau International des Poids et Mesures.

The JCGM actually says that the standard uncertainty is “uncertainty of the result of a measurement expressed as a standard deviation,” which is rather more general than Rich allowed.

The quotes below show that the JCGM includes systematic error as contributing to uncertainty.

Under E.3 “Justification for treating all uncertainty components identically” the JCGM says,

The focus of the discussion of this subclause is a simple example that illustrates how this Guide treats uncertainty components arising from random effects and from corrections for systematic effects in exactly the same way in the evaluation of the uncertainty of the result of a measurement. It thus exemplifies the viewpoint adopted in this Guide and cited in E.1.1, namely, that all components of uncertainty are of the same nature and are to be treated identically. (my bold)

Under JCGM E. 3.1 and E 5.2, we have that the variance of a measurement wᵢ of true value μᵢ is given by σᵢ² =E[(wᵢ – μᵢ)²], which is the standard expression for error variance.

After the usual caveats about [the] expectation of the probability distribution of each εi is assumed to be zero, E(εi) = 0, …, the JCGM notes that,

It is assumed that probability is viewed as a measure of the degree of belief that an event will occur, implying that a systematic error may be treated in the same way as a random error and that εᵢ represents either kind.(my bold).

In other words, the JCGM advises that systematic error is to be treated using the same statistical formalism as is used for random error.

R: “The real error statistic of interest is E[(M-X)²] = E[((M-m_m)+(m_m-X))²] = Var[M] + b², covering both a precision component and an accuracy component.”

Rich then referenced that equation to my paper and to long wave cloud forcing (LWCF;

Rich’s LCF) error. However, this is a fundamental mistake.

In Rich’s equation above, the bias, b = M-mm = a constant. Among GCMs however, step-wise cloud bias error varies across the global grid-points for each GCM simulation. And it also varies among GCMs themselves. See paper Figure 4 and SI Figure S6-1.

The factor (M-mm) = b, above, should therefore be (Mᵢ-mm) = bᵢ because b varies in a deterministic but unknown way with every Mᵢ.

A correct analysis of the case is:

E[(Mᵢ-X)²] = E[((Mᵢ-m_m)+(m_m-X))²] = Var[M] + Var[b]

Systematic error is discussed in more detail in the Appendix.

Rich goes on, “But the theory of converting variances and covariances of input parameter errors into output error via differentiation is well established, and is given in Equation (13) of the JCGM.”

Equation (13) of the JCGM provides the formula for the error variance in y, u²c(y), but describes it this way:

The combined variance, u²c(y), can therefore be viewed as a sum of terms, each of which represents the estimated variance associated with the output estimate y generated by the estimated variance associated with each input estimate xᵢ. (my bold)

That is, the combined variance, u²c(y), is the variance that results from considering all forms of error; not just random error.

Under JCGM 3.3.6:

The standard uncertainty of the result of a measurement, when that result is obtained from the values of a number of other quantities, is termed combined standard uncertainty and denoted by uc. It is the estimated standard deviation associated with the result and is equal to the positive square root of the combined variance obtained from all variance and covariance (C.3.4) components, however evaluated, using what is termed in this Guide the law of propagation of uncertainty (see Clause 5). (my bold)

Under JCGM E 4.4 EXAMPLE:

The systematic effect due to not being able to treat these terms exactly leads to an unknown fixed offset that cannot be experimentally sampled by repetitions of the procedure. Thus, the uncertainty associated with the effect cannot be evaluated and included in the uncertainty of the final measurement result if a frequency-based interpretation of probability is strictly followed. However, interpreting probability on the basis of degree of belief allows the uncertainty characterizing the [systematic] effect to be evaluated from an a priori probability distribution (derived from the available knowledge concerning the inexactly known terms) and to be included in the calculation of the combined standard uncertainty of the measurement result like any other uncertainty. (my bold)

JCGM says that the combined variance, u²c(y), includes systematic error.

The systematic error stemming from uncontrolled variables becomes a variable component of the output; a component that may change unknowably with every measurement. Systematic error then necessarily has an unknown and almost certainly non-normal dispersion (see the Appendix).

The JCGM further stipulates that systematic error is to be treated using the same mathematical formalism as random error.

Above we saw that uncontrolled deterministic variables produce a dispersion of systematic error biases in an extended series of measurements and in GCM simulations of global cloud fraction.

That is, the systematic error is a “fixed offset” = bᵢ only in the Mᵢ time-step. But the bᵢ vary in some unknown way across the n-fold series of Mᵢ.

In light of the JCGM discussion, the dispersion of systematic error, bᵢ, requires that any complete error variance include Var[b].

The dispersion of the bᵢ can be determined only by way of a calibration experiment against a known X carried out under conditions as identical as possible to the experiment.

The empirical methodological calibration error, Var[b] of X, is then applied to condition the result of every experimental determination or observation of an unknown X; i.e., it enters the reliability statement of the result.

In Example 1, the 1-foot ruler, Rich immediately assumed away the problem. Thus, “the manufacturer assures us that any error in that interval is equally likely[, but I will] write 12+/-_0.1 …, where the _ denotes a uniform probability distribution, instead of a single standard deviation for +/-.”

That is, rather than accept the manufacturer’s stipulation that all deviations are equally likely, Rich converted the uncertainty into a random dispersion, in which all deviations are no longer equally likely. He has assumed knowledge where there is none.

He wrote, “If I have only 1 ruler, it is hard to see how I can do better than get a table which is 120+/-_1.0”.” But that is wrong.

The unknown error in any ruler is a rectangular distribution of -0.1 to +0.1″, with all possibilities equally likely. Ten measurements with a ruler of unknown specific error can be anywhere from 1″ to -1″ in error. The expectation interval is (1-(-1)”/2 =1″. The standard uncertainty is then 1″/sqrt(3) = ±0.58″, thus 120±0.58″.

He then wrote that if one instead made ten measurements using ten independently machined rulers then the uncertainty of measurement = “sqrt(10) times the uncertainty of each.” But again, that is wrong.

The original stipulation is equal likelihood across ±0.1″ of error for every ruler. For ten independently machined rulers, every ruler has a length deviation equally likely to be anywhere within -0.1″ to 0.1″. That means the true total error using 10 independent rulers can again be anywhere from 1″ to -1″.

The expectation interval is again (1-(-1)”/2 = 1″, and the standard uncertainty after using ten rulers is 1″/sqrt(3) = ±0.58″. There is no advantage, and no loss of uncertainty at all, in using ten independent rulers rather than one. This is the outcome when knowledge is lacking, and one has only a rectangular uncertainty estimate — a not uncommon circumstance in the physical sciences.

Rich’s mistake is founded in his immediate recourse to pseudo-knowledge.

R; “We know by symmetry that the shortest plus longest [of a group of ten rulers] has a mean error of 0…” But we do not know that because every ruler is independently machined. Every length error is equally likely. There is no reason to assume a normal distribution of lengths, no matter how many rulers one has. The shortest may be -0.02″ too short, and the longest 0.08″ too long. Then ten measurements produce a net error of 0.3″. How would anyone know? One has no way of knowing the true error in the physical length of a shortest and a longest ruler.

The length uncertainty of any one ruler is [(0.1-(-0.1)/2]/sqrt(3) = ±0.058″. The only reasonable stipulation one might make is that the shortest ruler is (0.05±0.058)” too short and the longest (0.05±0.058)” too long. Then 5 measurements using each ruler yields a measurement with uncertainty of ±0.18″.

Complex variance estimates notwithstanding, Rich assumed away all the difficulty in the problem, wished his way back to random error, and enjoyed a happy dance.

Conclusion: Rich’s Section E is wrong, wherever it isn’t irrelevant.

1. He assumed random error when he should have considered deterministic error.

2. He badly misconstrued the message of JCGM concerning systematic error, and the meaning of its equation (13).

3. He ignored the centrally necessary condition of uncontrolled variables, and the consequent unknowable variation of systematic error across the data.

4. He wrongly treated systematic error as a constant offset.

5. His treatment of rectangular uncertainty is wrong.

6. He then wished rectangular uncertainty into a random distribution.

7. He treated assumed distributions as though they were known distributions — OK in a paper on statistical conjectures, a failing grade in an undergraduate instrumental lab course, and death in a real-world lab.

V. Problems with Section F:

The first part concerning comparative uncertainty is speculative statistics and so is here ignored.

Problems with Rich’s Marked Ruler Example 2:

The discussion neglected the resolution of the ruler itself, typically 1/4 of the smallest division.

It also ignored the question of whether the lined division marks are uniformly and accurately spaced — another part of the resolution problem. This latter problem can be reduced with recourse to a high-precision ruler that includes a manufacturer’s resolution statement provided by the in-house engineers.

It ignored that the smallest divisions on a to-be-visually-appraised precision instrument are typically manufactured in light of the human ability to resolve the spaces.

To achieve real accuracy with a ruler, one would have to calibrate it at several internal intervals using a set of high-accuracy length standards. Good luck with that.

VI. Problems with Rich’s thermometer Example 3 Section G:

Rich brought up what was apparently my discussion of thermometer metrology, made in an earlier comment on another WUWT essay.

He mentioned some of the elements I listed as going into uncertainty in the read-off temperature including, “the thermometer capillary is not of uniform width, the inner surface of the glass is not perfectly smooth and uniform, the liquid inside is not of constant purity, the entire thermometer body is not at constant temperature. He did not include the fact that during calibration human error in reading the instrument may have been introduced.”

I no longer know where I made those comments (I searched but didn’t find them) and Rich provided no link. However, I would never have intended that list to be exhaustive. Anyone wondering about thermometer accuracy can do a lot worse than to read Anthony Watts’ post about thermometer metrology.

Among impacts on accuracy, Anthony mentioned hardening and shrinking of the glass in LiG thermometers over time. After 10 years, he said, the reading might be 0.7 C high. A process of slow hardening would impose a false warming trend over the entire decade. Anthony also mentioned that historical LiG meteorology thermometers were often graduated in 2 ⁰F increments, yielding a resolution of ±0.5 ⁰F = ±0.3 ⁰C.

Rich mentioned none of that, in correcting my apparently incomplete list.

Here’s an example of a 19th century min-max thermometer with 2 ⁰F divisions.

Louis Cassella-type 19th century min-max thermometer with 2 ⁰F divisions.

Image from the Yale Peabody Museum.

High-precision Louis Castella thermometers included 1 ⁰F divisions.

Rich continued: “The interesting question arises as to what the (hypothetical) manufacturers meant when they said the resolution was +/-0.25K. Did they actually mean a 1-sigma, or perhaps a 2-sigma, interval? For deciding how to read, record, and use the data from the instrument, that information is rather vital.”

Just so everyone knows what Rich is talking about, pictured below are a couple of historical LiG meteorological thermometers.

Left: The 19th century Negretti and Zambara minimum thermometer from the Welland weather station in Ontario, Canada, mounted in the original Stevenson Screen. Right: A C.W Dixey 19th century Max-Min thermometer (London, after ca. 1870). Insets are close-ups.

The finest lineations in the pictured thermometers are 1 ⁰F and are perhaps 1 mm apart. The Welland instrument served about 1892 – 1957.

The resolution of these thermometers is ±0.25 ⁰F, meaning that smaller values to the right of the decimal are physically dubious. The 1880-82 observer at Welland, Mr. William B. Raymond, age about 20 years, apparently recorded temperatures to ±0.1 ⁰F, a fine example of false precision.

In asking, “Did [the manufacturers] actually mean a 1-sigma, or perhaps a 2-sigma, interval?“, Rich is posing the wrong question. Resolution is not about error. It does not imply a statistical variable. It is a physical limit of the instrument, below which no reliable data are obtainable.

The modern Novalynx 210-4420 Series max-min thermometer below is, “made to U.S. National Weather Service specifications.”

The specification sheet (pdf) of provides an accuracy of ±0.2 ⁰C “above 0 ⁰C.” That’s a resolution-limit number, not a 1σ number or a 2σ number.

A ±0.2 ⁰C resolution limit means the thermometers are not able to reliably distinguish between external temperatures differing by 0.2 ⁰C or less. It means any finer reading is physically suspect.

The Novalynx thermometers record 95 degrees across 8 inches, so that each degree traverses 0.084″ (2.1 mm). Reading a temperature to ±0.2 ⁰C requires the visual acuity to discriminate among five 0.017″ = 0.43 mm unmarked widths within each degree interval.

Historical thermometers were no better.

This leads to the question: — even though the thermometer is accurate to ±0.2 ⁰C, is it still reasonable to propose, as Rich did, that an observer should be able to regularly discriminate individual ±0.1 ⁰C intervals within merging 0.22 mm blank widths? Hint: hardly.

Rich’s entire discussion is unrealistic, showing no sensitivity to the meaning of resolution limits, of accuracy, of the graduation of thermometers, or of limited observer acuity.

He wrote, “In the present [weather thermometer] example, I would recommend trying for t² = 1/100, or as near as can be achieved within reason.” Rich’s t² is the variance of observer error, meaning he recommends reading to ±0.1 ⁰C in thermometers that are not accurate to better than ±0.2 ⁰C.

Rich finished by advising the manufacture of false data: “if the observer has the skill and time and inclination then she can reduce overall uncertainty by reading to a greater precision than the reference value. (my bold)”

Rich recommended false precision; a mistake undergraduate science and engineering students have flogged out of them from the very first day. But one that typifies consensus climatology.

His conclusion that, “Again, real life examples suggest the compounding of errors, leading to approximately normal distributions.” is entirely unfounded, based as it fully is on unrealistic statistical speculations. Rich considered no real-life examples at all.

The moral of Rich’s section G is that it’s not prudent to give advice concerning methods about which one has no experience.

The whole thermometer section G is misguided and is yet another example, after the several prior, of an apparently very poor grasp of physical accuracy, of its meaning, and of its fundamental importance to all of science.

VII. Problems with “The implications for Pat Frank’s paper” Section H:

Rich began his Section H with a set of declarations about the implications of his various

Sections now known to be over-wrought or plain wrong. Stepping through:

Section B: Rich’s emulator is constitutively inapt. The derivation is both wrong and incoherent. Tendentiously superfluous terms promote a predetermined end. The analysis is dimensionally unsound and deploys unjustified assumptions. No empirical validation of claimed emulator competence.

Section C: incorrectly proposes that offsetting calibration errors promote predictive reliability. It includes an inverted Stefan-Boltzmann equation and improperly treats operators as coefficients. As in other sections, Section B evinces no understanding of accuracy.

Section D: displays confusion about precision and accuracy throughout. The GCM emulator (paper eqn. 1) is confused with the error propagator (paper eqn. 5.2) which is fatal to Section D. No empirical validation of claimed emulator competence. Fatally misconstrues the analytical focus of my paper to be GCM projection means.

Section E: again, falsely asserted that all measurement or model error is random and that systematic error is a fixed constant bias offset. It makes empirically unjustified and ad hoc assumptions about error normality. It self-advantageously misconstrued the JCGM description of standard uncertainty variance.

Section F: has unrealistic prescriptions about the use of rulers.

Section G: displays no understanding of actual thermometers and advises observers to record temperatures to false precision.

Rich wrote that, “The implication of Section C is that many emulators of GCM outputs are possible, and just because a particular one seems to fit mean values quite well does not mean that the nature of its error propagation is correct.”

There we see again Rich’s fatal mistake that the paper is critically focused on mean values. He also wrote there are many possible GCM emulators without ever demonstrating that his proposed emulator can actually emulate anything.

And again here, “Frank’s emulator does visibly give a decent fit to the annual means of its target,…”

However, the analysis did not fit annual means. It fit the relationship between forcing and projected air temperature.

The emulator itself reproduced the GCM air temperature projections. It did not fit them. Contra Rich, that performance is indeed, “sufficient evidence to assert that it is a good emulator.”

And in further fact, the emulator tested itself against dozens of individual GCM single air temperature projections, not projection means. SI Figures S4-6, S4-8 and S4-9 show the decent fit residuals remain close to zero.

The tests showed beyond doubt that every tested GCM behaved as a linear extrapolator of GHG forcing. That invariable linearity of output behavior entirely justifies linear propagation of error.

Throughout, Rich’s analysis displays a thorough and comprehensively mistaken view of the paper’s GCM analysis.

The comments that finish his analysis demonstrate that case.

For example: “The only way to arbitrate between emulators would be to carry out Monte Carlo experiments with the black boxes and the emulators.” recommends an analysis of precision, with no notice of the need for accuracy.

Repeatability over reliability.

If ever there was a demonstration that Rich’s approach fatally neglects science, that is it.

This next paragraph really nails Rich’s mistaken thinking: “Frank’s paper claims that GCM projections to 2100 have an uncertainty of +/- at least 15K. Because, via Section D, uncertainty really means a measure of dispersion, this means that Equation (1) with the equivalent of Frank’s parameters, using many examples of 80-year runs, would show an envelope where a good proportion would reach +15K or more, and a good proportion would reach -15K or less, and a good proportion would not reach those bounds.”

First it was his Section E, not Section D, that supposed uncertainty is the dispersion of a random variable.

Second, Section IV above showed that Rich had misconstrued the JCGM discussion of uncertainty. Uncertainty is not error. Uncertainty is the interval within which the true value should occur.

Section D 6.1 of the JCGM establishes the distinction:

[T]he focus of this Guide is uncertainty and not error.

And continuing:

The exact error of a result of a measurement is, in general, unknown and unknowable. All one can do is estimate the values of input quantities, including corrections for recognized systematic effects, together with their standard uncertainties (estimated standard deviations), either from unknown probability distributions that are sampled by means of repeated observations, or from subjective or a priori distributions based on the pool of available information; ...

Unknown probability distributions sampled by means of repeated observations describes a calibration experiment and its result. Included among these is the comparison of a GCM hindcast simulation of global cloud fraction with the known observed cloud fraction.

Next, the ±15 C uncertainty does not mean some projections would reach “+15K or more” or “-15K or less.” Uncertainty is not error. The JCGM is clear on this point, as is the literature. Uncertainty intervals are not error magnitudes. Nor do they imply the range of model outputs.

The ±15 C GCM projection uncertainty is an ignorance width. It means that one has no information at all about the possible air temperature in year 2100.

Supposing that uncertainty propagated through a serial calculation directly implies a range of possible physical magnitudes is merely to reveal an utter ignorance of physical uncertainty analysis.

Rich’s mistake that an uncertainty statistic is a physical magnitude is also commonplace among climate modelers.

Among Rich’s Section H summary conclusion, the first is wrong, while the second and third are trivial.

The first is, “Frank’s emulator is not good in regard to matching GCM output error distributions.” There are two mistakes in that one sentence.

The first is that the GCM air projection emulator can indeed reproduce all the single air temperature projection runs of any given GCM. Rich’s second mistake is to suppose that GCM individual run variation about a mean indicates error.

Regarding Rich’s first mistake, the Figure below is taken from Rowlands, et al., (2012). It shows thousands of individual HadCM3L “perturbed physics” runs. Perturbed physics means the parameter sets are varied across their uncertainty widths. This produces a whole series of alternative projected future temperature states.

Original Figure Legend: “Evolution of uncertainties in reconstructed global-mean temperature projections under SRES A1B in the HadCM3L ensemble.”

This “perturbed physics ensemble” is described as “a multi-thousand-member ensemble of transient AOGCM simulations from 1920 to 2080 using HadCM3L,…”

Given knowledge of the forcings, the GCM air temperature projection emulator could reproduce every single one of those multi-thousand ensembled HadCM3L air temperature projections. As the projections are anomalies, emulator coefficient a = 0. The emulations would proceed by varying only the fCO₂ term. That is, the HadCM3L projections could be reproduced using the emulator with only one degree of freedom (see paper Figures 1 and 9).

So much for, “not good in regard to matching GCM output [so-called] error distributions.”

Second, the variance of the spread around the ensemble mean is not error, because the accuracy of the model projections remains unknown.

Studies of model spread, such as that of Rowlands, et al., (2012) reveal nothing about error. The dispersion of outputs reveals nothing but precision.

In calling that spread “error,” Rich merely transmitted his lack of attention to the distinction between accuracy and precision.

In light of the paper, every single one of the HadCM3L centennial projections is subject to the very large lower limit of uncertainty due to LWCF error, of the order ±15 C, at year 2080.

The uncertainty in the ensemble mean is the rms of the uncertainties of the individual runs. That’s not error, either. Or a suggestion of model air temperature extremes. It’s the uncertainty interval that reflects the total unreliability of the GCMs and our total ignorance about future air temperatures.

Rich wrote, “The “systematic squashing” of the +/-4 W/m^2 annual error in LCF inside the GCMs is an issue of which I for one was unaware before Pat Frank’s paper.

The implication of comments by Roy Spencer is that there really is something like a “magic” component R₃(t) anti-correlated with R₂(t), … GCM experts would be able to confirm or deny that possibility.”

Another mistake: the ±4 Wm⁻² is not error. It is uncertainty: a statistic. The uncertainty is not squashed. It is ignored. The unreliability of the GCM projection remains no matter that errors are made to cancel in the calibration period.

GCMs do deploy offsetting errors, but studied model tuning has no impact on simulation uncertainty. Offset errors do not improve the underlying physical description.

Typically, error (not uncertainty) in long wave cloud forcing is offset by an opposing error in short wave cloud forcing. Tuning allows the calibration target to be reproduced, but it provides no reassurance about predictive reliability or accuracy.

General conclusions:

The entire analysis has no critical force.

The proposed emulator is constitutively inapt and tendentious.

Its derivation is mathematically incoherent.

The derivation is dimensionally unsound, abuses operator mathematics, and deploys unjustified assumptions.

Offsetting calibration errors are incorrectly and invariably claimed to promote predictive reliability.

The Stefan-Boltzmann equation is inverted

Operators are improperly treated as coefficients.

Accuracy is repeatedly abused and ejected in favor of precision.

The GCM emulator (paper eqn. 1) is fatally confused with the error propagator (paper eqn. 5.2)

The analytical focus of the paper is fatally misconstrued to be model means.

The difficulties of measurement error or model error is assumed away, by falsely and invariably asserting all error to be random.

Uncertainty statistics are wrongly and invariably asserted to be physical error.

Systematic error is falsely asserted to be a fixed constant bias offset.

Uncertainty in temperature is falsely construed to be an actual physical temperature.

Ad hoc assumptions about error normality are empirically unjustified.

The JCGM description of standard uncertainty variance is self-advantageously misconstrued.

The described use of rulers or thermometers is unrealistic.

Readers are advised to read and record false precision.

Appendix: A discussion of Error Analysis, including the Systematic Variety

Rich also posted a comment under his “What do you mean by “mean” critique here attempting to show that systematic error cannot be included in an uncertainty variance.

Comments closed on the thread before I was able to finish a critical reply. The subject is important, so the reply is posted here as an Appendix.

In his comment, Rich assumed uncertainty to be the dispersion of a random variable with mean b and variance s². He concluded by claiming that an uncertainty variance cannot include bias errors.

Bias errors are another name for systematic errors, which Rich represented as a non-zero mean of error, ‘b.’

Below, I go through a number of relevant cases. They show that the mean of error, ‘b’, never appears in the formula for an error variance. They also show that the systematic errors from uncontrolled variables must be included in an uncertainty variance.

That is, the foundation of Rich’s derivation, which is:

“the uncertainty of a sum of n independent measurements with respective [error] means bᵢ and variances vᵢ is that given by JCGM 5.1.2 with unit differential: sqrt(sumᵢ g(vᵢ,bᵢ)²) where v = sumᵢ vᵢ, b = sumᵢ bᵢ.”

is wrong.

Given that mistake, the rest of Rich’s analysis there also fails, as demonstrated in the cases that follow.

Interestingly, the mean of error, ‘b,’ does not enter in the variance equation (10) in JCGM 5.1.2, either.

++++++++++++

For any set of n measurements xᵢ of X, xᵢ = X + eᵢ, where eᵢ is the total error in the xᵢ.

Total error eᵢ = rᵢ + dᵢ where rᵢ = random error and dᵢ = systematic error.

The errors eᵢ cannot be known unless the correct value of X is known.

In what follows “sumᵢ” means sum over the series of i where i = 1 ® n, and Var[x] is the error variance of x.

Case 1: X is known.

1.1) When X is known, and only random error is present.

The experiment is analogous to an ideal calibration of method.

Then eᵢ = xᵢ – X, and Var[x] = [sumᵢ(xᵢ – X)²]/n = [sumᵢ(eᵢ)²]/n. In this case eᵢ = rᵢ only, because systematic error = dᵢ = 0.

Then [sumᵢ (eᵢ)²]/n = [sumᵢ(rᵢ)²]/n.

For n measurements of xᵢ the mean of error = b = sumᵢ(eᵢ)/n.

When only random error contributes, the mean of error b tends to zero at large n.

So, Var[x] = sumᵢ[(xᵢ – X)²]/n = sumᵢ[(X + rᵢ) -X)²]/n = sumᵢ[(rᵢ)²]/n

and the standard deviation describes a normal dispersion centered around 0.

Thus, when error is a random variable, the mean of error ‘b’ does not appear in the variance.

In case 1.1, Rich’s uncertainty, sqrt[sumᵢ g(vᵢ,bᵢ)²] is not correct and in any event should have been written sqrt[sumᵢ g(sᵢ,bᵢ)²].

1.2) X is known, and both random error and constant systematic error are present.

When the dᵢ are present and constant, then dᵢ = d for all i.

The mean of error = ‘b,’ = sumᵢ [(xᵢ – X)]/n = sumᵢ [(X + rᵢ + d) – X]/n = sumᵢ [(rᵢ+d)]/n = nd/n + sumᵢ [(rᵢ)/n], which goes to ‘d’ at large n.

Thus, in 1.2, b = d.

And: Var[x] = sumᵢ[(xᵢ – X)²]/n = sumᵢ{[(X + rᵢ + d) – X]²}/n = sumᵢ [(rᵢ+d)²]/n, which produces a dispersion around ‘d.’

Thus, because X is known and ‘d’ is constant, ‘d’ can be found exactly and subtracted away.

The mean of the final error, ‘b’ never enters the variance.

That is, when b => d = a real number constant that can be known and can be corrected out of subsequent measurements of samples.

This last remains true in other laboratory samples where the X is unknown, because the method has been calibrated against a similar sample of known X and the methodological ‘d’ has been determined. That ‘d’ is always constant is an assumption, i.e., that experimenter error is absent and the methodology is identical.

Case 2: X is UNknown, and both random error and systematic error are present

Then the mean of xᵢ = [sumᵢ (xᵢ)/n] = x_bar.

As before, let xᵢ = X + eᵢ = X + rᵢ + dᵢ.

Var[x] = sumᵢ[(xᵢ – x_bar)²]/(n-1), and the SD describes a dispersion around x_bar.

2.1) Systematic error = 0.

If dᵢ = 0, then eᵢ = rᵢ is random, and x_bar becomes a good measure of X at large n.

Var[x] = sumᵢ[(xᵢ – x_bar)²]/(n-1) = sumᵢ[(X + rᵢ) – (X+r_r)]²/(n-1) = sumᵢ[(rᵢ – r_r)]²/(n-1), where r_r is the residual of error in x_bar over interval ‘n’.

As above, the mean of error ‘b’ = sumᵢ[(xᵢ – x_bar)]/n, = sumᵢ[(X+rᵢ) – (X+r_r)]/n = sumᵢ[(rᵢ – r_r)]/n = [n(r_r)/n + sumᵢ(rᵢ)/n], and b = r_bar + r_r, where r_bar is the average of error over the ‘n’ interval.

Then b is a real number, which again does not enter the uncertainty variance and which approaches zero at large n

2.2) If dᵢ is constant = c

Then xᵢ = X + rᵢ + c.

The error mean = ‘b’ = sumᵢ[(xᵢ – x_bar)]/n = sumᵢ{[(X + rᵢ + c) – (X + c + r_r)]}/n, and b = sumᵢ[(rᵢ -r_r)/]n = sumᵢ(rᵢ)/n – n(r_r)/n, and b = (r_bar – r_r), wherein the signs of r_bar and r_r are unspecified.

The Var[x] = sumᵢ[(xᵢ – x_bar)²]/(n-1) = sumᵢ [(X+rᵢ+c) – (X+r_r+c)]²/(n-1) = sumᵢ[(rᵢ – r_r)²]/(n-1).

The variance describes a dispersion around r_r.

The mean error, ‘b’ does not enter the variance.

Case 3; X is UNknown and systematic error, dᵢ, varies due to uncontrolled variables.

Uncontrolled variables mean that every measurement (or every model run) is impacted by inconstant deterministic perturbations, i.e., inconstant causal influences. These modify the value of each result with unknown biases that vary with each measurement (or model run).

Any measurement, xᵢ = X +rᵢ + dᵢ, and dᵢ is a deterministic, non-random variable, and usually non-zero.

Over two measurement sequences of number n and m, the mean error = b = sumᵢ(rᵢ + dᵢ)/n, = bn, and bm = sumj(rj + dj)/m and bn ≠ bm, even if interval n equals interval m.

Var[x]n = sumᵢ[(xᵢ – x_bar-n)²]/(n-1), where x_bar-n is x-bar over sequence n.

Var[x]m = sumj[(xj – x_bar-m)²/](m-1)

and Var[x]n = sumᵢ[(xᵢ – x_bar-n)²]/(n-1) = sumᵢ [(X+rᵢ+dᵢ)-(X+r_r+d_bar-n)]²/(n-1) = sumᵢ[(rᵢ – r_r + dᵢ – d_bar-n)²]/(n-1) = sumᵢ[rᵢ -(d_bar-n + r_r – dᵢ)]²]/(n-1).

Likewise, Var[x]m = sumj[(rj – (d_bar-m + r_r – dj)]²]/(m-1).

Thus, neither bn nor bm enter into either Var[x], contradicting Rich’s assumption.

The dᵢ, dj enter into the total uncertainty of the x_bar-n, x_bar-m. Further, the variation of dᵢ, dj with each i, j means that the dispersion of Var[x]n,m will include the dispersion of the dᵢ, dj. The deterministic cause of dᵢ, dj will very likely make their distribution non-normal.

That is, when systematic error is inconstant due to uncontrolled variables, dᵢ will vary with each i, and will produce a dispersion represented by the standard deviation of the dᵢ.

This negates the claim that systematic error cannot contribute an uncertainty interval.

Also, x_bar-n ≠ x_bar-m, and [dᵢ – (d_bar – n] ≠ [dj – (d_bar-m].

Therefore Var[x]n ≠ Var[x]m, even at large n, m and including when n = m over well-separated periods.

Case 4: X is known, and dᵢ varies due to uncontrolled variables.

This is a case of calibration against a known X when uncontrolled variables are present, and mirrors the calibration of GCM-simulated global cloud fraction against observed global cloud fraction.

4.1) A series of n measurements.

Here, eᵢ = xᵢ – X, and Var[x] = sumᵢ[(xᵢ – X)²]/n = sumᵢ(eᵢ)²/n = sumᵢ[(rᵢ + dᵢ)²]/n = [u(x)²].

As eᵢ = rᵢ + dᵢ, then Var[x] = sumᵢ(rᵢ + dᵢ)²]/n, but the values of each rᵢ and dᵢ are unknown.

The denominator is ‘n’ rather than (n+1) because X is known and degrees of freedom are not lost to a mean in calculating the standard variance.

For n measurements of xᵢ the mean of error = b = sumᵢ(eᵢ)/n = sumᵢ(rᵢ + dᵢ) = variable depending on ‘n,’ because dᵢ varies in an unknown but deterministic way across n.

However, X is known, therefore (xᵢ – X) = eᵢ is known to be the true and complete error in the i-th measurement.

At large n, the sumᵢ(rᵢ) becomes negligible. and Var[x] = sumᵢ[(eᵢ)²/n] = sumᵢ[(dᵢ)²/n] = [u(x)²] at the limit, which is very likely a non-normal dispersion.

The systematic error produces a dispersion because the dᵢ vary. At large n, the uncertainty reduces to the interval due to systematic error.

Th mean of error, ‘b,’ does not enter the variance.

The claim that systematic error cannot produce an uncertainty interval is again negated.

5) X is UNknown and dᵢ varies due to uncontrolled variables. The experimental sample is physically similar to the calibration sample in 4.

5.1) Let xᵢ’ be the i-th of n measurements of experimental sample 5.

The estimated mean of error = b = sumᵢ(x’ᵢ – x’_bar)/n

When x’ᵢ is measured, and X’ is unknown, Var[x’] = sumᵢ[(x’ᵢ – x’_bar)²]/(n-1).

= sumᵢ[(X’+r’ᵢ+d’ᵢ) – (X’ + d’_bar + r’-r)]²/(n-1).

and Var[x’] = sumᵢ[r’ᵢ + (d’ᵢ – d’_bar – r’_r))²]/(n-1).

Again, the mean of error b’ does not enter into the empirical variance.

And again, the dispersion of the implicit dᵢ contributes to the total uncertainty interval.

The empirical error mean ‘b’ is not an accuracy metric because the true value of X is not known.

The empirical dispersion, Var[x’], is an uncertainty interval about x’_bar within which the true value of X is reckoned to lay. The [Var[x’] does not describe an error interval, because the true error is unknown.

In the event of an available calibration, the uncertainty variance of the mean of the xᵢ’ can be assigned as the methodological calibration variance [(u(x)]² as in 4.1, if the multiple of measurements is close to the conditions of the calibration experiment.

The methodological uncertainty then describes an interval within which the true value of X is expected to lay. The uncertainty interval is not a dispersion of physical error.

Modesty about uncertainty is deeply recommended in science and engineering. So, if measurement n₅ < calibration n₄, we choose the conservative empirical uncertainty, [u(x’)²] = Var[x’] = sumᵢ[(x’ᵢ – x’_bar)²]/(n-1).

The estimated mean of error, b’, does not enter the variance.

The presence of the d’ᵢ in the variance again ensures that the uncertainty interval includes a contribution from the interval due to variable systematic error.

5.2) A single measurement of x’. The experimental sample is again similar to 4.

One measurement does not have a mean, so (x’_i – x’_bar) is undefined.

However, from 4.1, we know the methodological [u(x)²] from the calibration experiment using a sample of known X.

Then, for a single measurement of x’ in an unknown but analogous sample, we can indicate the reliability of x’ by appending the standard deviation of the known calibration variance above, sqrt(u(x)²) = ±u(x).

Thus, the measurement of x’ is conveyed as x’±u(x), and ±u(x) is assigned to any given single measurement of x’.

Single measurements do not have an error mean, ‘b.’ which in any case cannot appear in the error statement of x’.

However, the introduced calibration variance includes the uncertainty interval due to systematic error.

The uncertainty interval again does not represent the spread of error in the measurement (or model output). It represents the interval within which the physically true value is expected to lay.

Conclusions:

In none of these standard cases, does the error mean, ‘b,’ enter the error variance.

A constant systematic error does not produce a dispersion.

When variable systematic error is present and the X is known, the uncertainty variance of the measured ‘x’ represents a calibration error statistic.

When variable systematic error is present and X is unknown, the true error variance in measured x is also unknown, but is very likely a non-normal uncertainty interval that is not centered on the physically true X.

That uncertainty interval can well be dominated by the dispersion of the unknown systematic error. A calibration uncertainty statistic, if available, can then be applied to condition the measurements of x (or model predictions of x).

Rich’s analysis failed throughout.

This Appendix finishes with a very relevant quote from Vasquez and Whiting (2006):

“[E]ven though the concept of systematic error is clear, there is a surprising paucity of methodologies to deal with the propagation analysis of systematic errors. The effect of the latter can be more significant than usually expected. … Evidence and mathematical treatment of random errors have been extensively discussed in the technical literature. On the other hand, evidence and mathematical analysis of systematic errors are much less common in literature.”

My experience with the statistical literature has been the almost complete neglect of systematic error as well. Whenever it is mentioned, systematic error is described as a constant bias, and little further is said of it. The focus is on random error.

One exception is in Rukhin (2009), who says,

“Of course if [the expectation value of the systematic error is not equal to zero], then all weighted means statistics become biased, and [the mean] itself cannot be estimated. Thus, we assume that all recognized systematic errors (biases) have been corrected for …”

and then off he goes into safer ground.

Vasquez and Whiting go on: “When several sources of systematic errors are identified, ‘β’ is suggested to be calculated as a mean of bias limits or additive correction factors as follows:

β = sqrt{sumᵢ[u(x)ᵢ]²}

where i defines the sources of bias errors, and [dᵢ] is the bias range within the error source i . Similarly, the same approach is used to define a total random error based on individual standard deviation estimates,

ek = sqrt{sumᵢ[σ(x)ᵢ]²}”

That is, Vasquez and Whiting advise estimating the variance of non-normal systematic error using exactly the same mathematics as is used for random error.

They go on to advise combining both into a statement of total uncertainty as,

u(x)total = sqrt[β² +(ek)²].

The Vasquez and Whiting paper completely justifies the method of treating systematic error employed in “Propagation…”

++++++++++++

References:

V. R. Vasquez and W.B. Whiting (2006) Accounting for Both Random Errors and Systematic Errors in Uncertainty Propagation Analysis of Computer Models Involving Experimental Measurements with Monte Carlo Methods Risk Analysis 25(6),1669-1681 doi: 10.1111/j.1539-6924.2005.00704.x.

A. L. Rukhin, (2009) Weighted means statistics in interlaboratory studies Metrologia 46, 323-331; doi: 10.1088/0026-1394/46/3/021

via Watts Up With That?

https://ift.tt/2PDYVP8

March 1, 2020 at 01:02PM

Share this:
Print
Facebook
X
Reddit
LinkedIn
Email
Pinterest
Tumblr
Like Loading...

Related