The Interpretation of Interpolation

Guest Post by Willis Eschenbach

Over in the comments at a post on a totally different subject, you’ll find a debate there about interpolation for areas where you have no data. Let me give a few examples, names left off.

•

Kriging is nothing more than a spatially weighted averaging process. Interpolated data will therefore show lower variance than the observations.

The idea that interpolation could be better than observation is absurd. You only know things that you measure.

•

I’m not saying that interpolation is better than observation. I’m saying interpolation using locality based approach is better than one that uses a global approach. Do you disagree?

•

I disagree, generally interpolation in the context of global temperature does not make things better. For surface datasets I have always preferred HadCRUT4 over others because it’s not interpolated.

Once you interpolate you are analysing a hybrid of data+model, not data. What you are analysing then takes on characteristics of the model as much as the data. Bad.

•

How do you estimate the value of empty grid cells without doing some kind of interpolation?

•

YOU DON’T! You tell the people what you *know*. You don’t make up what you don’t know and try to pass it off as the truth.

If you only know the temp for 85% of the globe then just say “our metric for 85% of the earth is such and such. We don’t have good data for the other 15% and can only guess at its metric value.”.

•

If you don’t have the measurements, then you cannot assume anything about the missing data. If you do, then you’re making things up.

Hmmm … folks who know me know that I prefer experiment to theory. So I thought I’d see if I could fill in empty data and get a better answer than leaving the empty data untouched. Here’s my experiment. I start with the CERES estimate of the average temperature 2000 – 2020.

Figure 1. CERES surface temperature average, 2000-2020

Note that the average temperature of the globe is 15.2°C, the land is 8.7°C, and the ocean is 17.7°C. Note also that you can see that the Andes mountains on the left side of upper South America are much cooler than the other South American Land.

Next, I punch out a chunk of the data. Figure 2 shows that result.

Figure 2. CERES surface temperature average with removed data, 2000-2020

Note that average global temperatures are now cooler with the missing data, with the globe at 14.6°C versus 15.2°C for the full data, a significant error of about 0.6°C. Land and sea temperatures are too low as well, by 1.3°C and 0.4°C respectively.

Next, I use a mathematical analysis to fill up the hole. Here’s that result:

Figure 3. CERES surface temperature average with patched data, 2000-2020

Note that the errors for land temperature, sea temperature, and global temperature have all gotten smaller. In particular, the land error has gone from 1.4°C to 0.1°C. The estimate for the ocean is warm in some areas, as can be seen in Figure 3. However, the global average ocean temperature is still better than just leaving the data out (0.1°C error rather than 0.4°C error).

My point here is simple. There are often times when you can use knowledge about the overall parameters of the system to improve the situation when you are missing data.

And how did I create the patch to fill in the missing data?

Well … I think I’ll leave that unspecified at this time, to be revealed later. Although I’m sure that the readers of WUWT will suss it out soon enough …

My best wishes to all,

PS—To avoid the misunderstandings that are the bane of the intarwebs, PLEASE quote the exact words that you are discussing.