Guest Post by Willis Eschenbach (@weschenbach on eX-Twitter)
Our estimable Charles the Moderator, who gets my eternal thanks for keeping the hits happening here on WUWT, asked me to take a look at a new paper yclept Multivariate Analysis Rejects the Theory of Human-caused Atmospheric Carbon Dioxide Increase:The Sea Surface Temperature Rules by Dai Ato, an independent researcher in Japan. Seems it’s been getting some play. I’ll refer to this paper as Ato2024.
I wasn’t far in before alarms went off. The study conducted a multivariate analysis using publicly available data to examine the impact of sea surface temperature (SST) and human emissions on atmospheric CO₂ levels.
It concluded that SST was the independent determinant of the annual increase in atmospheric CO₂ concentration. Human emissions were found to be irrelevant in the regression models.
And most revealingly, it says:
Furthermore, the atmospheric CO₂ concentration predicted, using the regression equation obtained for the SST derived from UK-HADLEY centre after 1960, showed an extremely high correlation with the actual CO₂ concentration (Pearson correlation coefficient r = 0.9995, P < 3e-92).
BZZZZT!! Whenever I get an r-value that high, I know for a fact that I’m doing something very wrong … I’ll get back to that.
First, let me start with one of the three variables in their analysis, which are SST, CO2, and emissions. Here are three reconstructions of SST since 1854 by three different groups.
Figure 1. Global monthly average sea surface temperatures (SSTs). Yellow area at the right is the portion of the record analyzed by Ato.
While there are some differences, overall the pattern is clear. There was SST warming from ~ 1850 to about 1870, cooling to ~ 1910, warming to ~ 1940, cooling to ~ 1965, and warming since then.
Looking at that, I can see why Ato doesn’t want to use the full record—it doesn’t support his claim that SSTs are the independent determinant of atmospheric CO2 levels. The CO2 data (Figs. 2 and 3 below) looks nothing like that.
So how does he justify the cutoff? Well, the Mauna Loa CO2 measurement data starts about 1960. However, it can be extended back beyond that using the ice core CO2 records. Here’s what that looks like.
Figure 2. Mauna Loa and ice core measurements of the background atmospheric CO2 levels, 1000-2010AD. Data: Ice Cores Mauna Loa
Ato2024 says that the ice core records are not accurate. However, this is belied by the close agreement of the ice core records with each other and with the Mauna Loa measurements as shown above.
Below is a closer view of the recent end of the data since 1850, corresponding to the time frame of the sea surface temperatures (SSTs) in Figure 1.
Figure 3. As in Figure 2, but post-1850 data only
As a result of the good agreement of the ice cores both with each other and with the Mauna Loa data, I see no problem in taking that as a good reconstruction of the post-1850 CO2 levels.
The problem, of course, is that the pre-1960 ocean temperatures do not look anything like the pre-1960 CO2 levels … and this disagreement totally falsifies Ato2024. So he is obliged to ignore it.
Next, how did he get such a great correlation, 0.9995, between SST and CO2 in the post-1960 data? In part the answer lies in what he looked at. Here’s the Mauna Loa post-1960 CO2 record he used. Note that he didn’t use the monthly data, just the annual data. Makes it easier to get a higher Pearson correlation coefficient “r”.
Figure 4. Mauna Loa Observatory CO2 observations, along with the linear trend line.
The recent increase in CO2 is a very slowly accelerating curve which is nearly a straight line. This leads to many false correlations because such a curve is easy to replicate as we’ll see below. This is a recurring problem in climate science.
But that’s just the first problem. The main problem is the procedure that he used. Here’s the description from the paper.
Note that the symbol delta (∆) in the equations means “change in”. So ∆CO2 is the change in CO2 from one year to the next.
Translated, that says:
- Calculate the best-fit linear estimation of the annual changes in CO2 (∆CO2), based on the Hadley HadSST sea surface temperature.
- The predicted atmospheric CO2 is then the starting atmospheric CO2 plus the cumulative sum of the estimated annual changes in CO2.
Here’s a graph of the first part of that calculation, fitting the SST to the annual change in CO2.
Figure 5. Post 1960 annual change in atmospheric CO2 (∆CO2), along with the linear trend line of ∆CO2, and the best estimation of ∆CO2 based on the Hadley HadSST4.0.1.
Now, there’s an oddity about graphing delta CO2, or ∆ anything for that matter. It involves a couple of curious changes. I’ll use graphing ∆CO2 as in Fig. 5 as my example.
First, any overall linear trend in the CO2 data is converted into an overall offset from zero (a non-zero average) in the ∆CO2 graph.
Second, any overall acceleration in the CO2 data is converted into an overall linear trend in the ∆CO2 graph.
So from looking at Figure 5, we can see that the ∆CO2 data has both a positive trend and an acceleration. We can see both of those in Figure 3 above.
And now that we’ve fitted the SST to the ∆CO2 data so we can estimate the ∆CO2, we simply sum those changes cumulatively to estimate the underlying CO2 data. Here’s that result.
Figure 6. Mauna Loa CO2 data, and Ato2024 estimation of the Mauna Loa CO2 data
At this point, I’ve replicated his results.
Now, remember that I said that a correlation coefficient of 0.999+ means there’s some fatal flaw in the logic. So … what’s not to like?
In his note asking me to take a look at this paper, Charles The Moderator included an interesting AI analysis of the paper, viz (emphasis mine):
Based on my analysis of the paper, the key issue of circular reasoning appears to be in the methodology used to predict atmospheric CO2 concentrations from sea surface temperature (SST) data. Specifically:
• The author uses multiple linear regression to derive an equation relating annual CO2 increase to SST for the period 1960-2022.
• This equation is then used to “predict” CO2 concentrations for the same 1960-2022 time period.
The predicted and measured CO2 concentrations are found to have an extremely high correlation (r = 0.9995).
The circular reasoning occurs because the same data is used both to derive the equation and to test its predictive power. The key equations involved are:
The regression equation (from Step 7 in the paper):
Annual CO2 increase = 2.006 × HAD-SST + 1.143 (after 1959)
The prediction equation:
[CO2]n = Σ[ΔCO2]i + Cst
Where [CO2]n is the predicted CO2 concentration, [ΔCO2]i is the annual increase calculated from the regression equation, and Cst is the actual CO2 concentration in the starting year.
By using this method, the author is essentially fitting the equation to the data and then using that same fitted equation to “predict” the data it was derived from. This guarantees an extremely high correlation that does not actually demonstrate any predictive power or causal relationship.
A proper analysis would use separate training and testing datasets, or employ techniques like cross-validation, to avoid this circularity.
The extremely high correlation reported is almost certainly an artifact of this flawed methodology rather than evidence of a genuine relationship between SST and atmospheric CO2 levels.
And the AI is right. Well, partly right. They’re right to say that the problem is not that Ato2024 fitted SST to CO2. The problem is that Ato2024 didn’t withhold half the data to verify the results. It’s easy to predict something when you already know the outcome …
HOWEVER, and it’s a big however … while that problem alone is enough to totally falsify the conclusions, there’s another really big problem. To illustrate that, I’ve used the Ato2024 method. But instead of using sea surface temperature as the input to be fitted to the ∆CO2 data as Ato2024 does, I’ve fitted a straight line to the ∆CO2 data. It’s the blue line in Figure 3 above.
And using the Ato2024 method, I’ve converted that straight line to the equivalent CO2 data shown in red in Figure 7 below.
Figure 7. As in Figure 6, plus a red line showing the result of using a simple straight line in place of the sea surface temperature (SST) used the Ato2024.
Interesting. Using the Ato2024 method of fitting a variable to ∆CO2, a straight line as input does just as as well as using the SST as input.
But that doesn’t really show the full scope of the problem. To do that, I first divided the SST, the straight line, and ∆CO2 data in two halves. I used the first half for fitting either the SST or the straight line to the ∆CO2. Then I used those results to estimate the change in CO2. Figure 8 shows that result.
Figure 8. As in Figure 7, but using only the first half of the data to fit the model, and then using the full data to see how well it performs.
This graph reveals two separate problems. First, although the fit is considerably poorer than in Figure 6, the Pearson correlation coefficient “r” is basically unchanged … meaning that it is not an appropriate measure for this particular issue.
Next, the straight line continues to perform just as well as using the SST as the independent variable … no bueno. This indicates a profound problem with the underlying Ato method.
To show the problem, I’m gonna re-show Figure 5 from above.
To recap, first, any overall linear trend in the CO2 data is converted into an overall offset from zero (a non-zero average) in the ∆CO2 graph.
Second, any overall acceleration in the CO2 data is converted into an overall linear trend in the ∆CO2 graph.
And here’s the key. When you fit the SST data (or more importantly, any data) to the ∆CO2 data, you end up with a fitted signal that has the same non-zero average and the same trend as the ∆CO2 data.
Not only that, but the fit will be balanced, with the amount above and the amount below the trend line being equal.
And all of that guarantees that if you start out trying to predict a smooth curve, when you reconstruct the signal using the method of Ato2024, you’ll get an answer that is VERY close to the smooth curve regardless of what variable you use to reconstruct the signal.
And that is why using the straight line does just as well as using the SST, or any other variable, as the basis for the estimation of CO2.
I weep for the death of honest peer-review …
My best to everyone,
w.
Yeah, you’ve heard it before: When you comment please quote the exact words you’re discussing. It avoids endless misunderstandings.
via Watts Up With That?
September 12, 2024 at 12:03PM
