by Dr. Alan Welch FBIS FRAS — 4 September 2024
Having used Excel for the last 7 years to study Sea Levels I have discovered various pit falls, techniques and useful functions. I wish to share these with the wider WUWT community. To some it may be like teaching your Granny to suck eggs (as we say in the UK) but hopefully some may pick up a few useful facts or ideas.
To illustrate findings, I have used the New York Battery data as recently discussed by Kip Hansen (1). These cover the period 1856 to 2024 but due to the hiatus in data between 1878 and 1893 only the period 1893 to 2023 will be considered as 131 continuous years. Note there were a few months missing in 1920, but this has little impact.
To reduce the quantity of data and remove any seasonal effects, each year will be averaged producing 131 yearly data values. Feeding these data into Excel and producing a standard presentation results in the graph shown in Figure 1.
Figure 1
Trend Line Equations
As values the coefficients of the Trend Line Equations are accurate to the precision given and usable as such. The problem arises if the equations, especially the quadratic coefficients, are used in subsequent analyses such as graph plotting and, although not recommended, extrapolation. The problems in plotting are illustrated below where the trend line equations are subjected to “Trend Line Formatting” increasing the significant decimal places to 5 and 6 respectively. With the standard Excel presentation there is a consistent error of nearly 200 mm. This comes about because the date values are squared and when multiplied by a coefficient with only 4 decimal places result in the calculated sea levels becoming basically the difference between two large numbers. Even with 5 decimal places there is a general error of about 15 mm. It is only with 6 decimal places that the error reduces to less than 1 mm.
If Cubic or higher polynomial curves are involved this problem is further exacerbated. These higher polynomial curves are not generally recommended but, on some occasions, can point to other behaviour such as would occur with decadal oscillations although using a moving average trend line would probably be more suitable. Extrapolation of these higher polynomial curves is even a more “no-no” as the highest-powered term soon dominates the calculation.
Figure 2
There are several ways the above problem can be overcome in most cases.
1. Reduce the date values by deducting a suitable figure such as 1900 so the dates now only vary from -7 to 123.
2. Generally show more decimal places.
3. The surest way is to make use of the “LINEST” function in EXCEL. This function is generally described as returning the parameters of a linear trend and has the form, assuming the year and sea level data are stored in columns A and B and rows 1 to 131
The 2 coefficients will then appear in the cell where the function was entered and the adjacent cell.
It is not well known that the LINEST function can be extended to solve a much wider range of regression analyses including cubic, quartic etc curves.
To solve for a quadratic the function is entered as
= LINEST(A1:A131,B1:B131^{1,2})
And for a cubic
= LINEST(A1:A131,B1:B131^{1,2,3})
In these cases, the coefficients will appear in 3 or 4 adjacent cells.
Having obtained values for the 2, 3 or 4 coefficients they can be captured as named values in subsequent calculations by using the “Name Box” and then use Formulas>Defined Names>Define Name. In my work I called the values LINEAR1 and LINEAR such that
Sea Level = LINEAR1 * year + LINEAR
Sea Level = QUAD2 * year2 + QUAD1 * year + QUAD
or for cubic curve
Sea Level = CUBIC3 * year3 + CUBIC2 * year2 + CUBIC1 * year + CUBIC
These formulae will then produce accurate values for the sea levels.
Curve Fitting 1
The quadratic curve may be one of many other equally well-fitting curves. In this instance another class of curve could be long term sinusoidal variation and the following shows how a series of sinusoidal curves can be generated and compared.
A general sinusoidal curve can take the form (in Excel Format)
=CONST+AMP*SIN(((SHIFT+2*A1)/PERIOD)*PI())
where CONST is a constant mean sea level (mm)
AMP is a +/- amplitude variation (mm)
SHIFT is a phase shift (years)
And PERIOD is a period of a complete oscillation (years)
How can these four values be derived. Not knowing any mathematical process to home in on the four values that give the “best fit” the following method was used. The question of what constitutes a best fit is not clear at this stage so will be addressed as the process progresses.
Stage 1 was to choose PERIOD, say 1000 years.
Stage 2 was to set CONST at zero.
That leaves AMP and SHIFT. Choose a value of AMP such as 1000mm and any value of SHIFT and plot the result of applying all 4 values over the total period covered by the data. It will probably be totally wrong so modify SHIFT until the portion of the sinusoidal curve roughly resembles the data/quadratic curve in form.
Adjust SHIFT and AMP until the portion of the sinusoidal curve looks approximately “parallel” to the quadratic curve at which stage a value of CONST can be added to make the 2 curves roughly coincide. Final tuning of CONST, AMP and SHIFT can be carried out either visually or by introducing some “best fit” criteria. In this instant the visual inspection was considered sufficient but later with a different case the question of best fit will be addressed.
After much blood, sweat and toil the following parameters were homed in on.
CONST = 223 mm AMP = 600 mm SHIFT = -300 years PERIOD = 1000 years
A second curve based on a period of 1500 years was then more easily found giving
CONST = 695 mm AMP = 1090 mm SHIFT = -510 years PERIOD = 1500 years
Using interpolation and some small fine tuning 3 other curves were found
CONST = 342 mm AMP = 722 mm SHIFT = -355 years PERIOD = 1125 years
CONST = 462 mm AMP = 845 mm SHIFT = -410 years PERIOD = 1250 years
CONST = 578 mm AMP = 968 mm SHIFT = -469 years PERIOD = 1175 years
Figure 3
Figure 3 shows the Quadratic Curve and the 5 Sinusoidal Curves. They all lie within about 3 mm of each other which is minimal considering the range of the basic data.
The following statement occurred to me, but I am not sure if it is profound or naive.
“There is one Quadratic Curve and a multitude of Sinusoidal Curves. Why would Nature choose the Quadratic Curve?”
Figures 4 and 5 show the 1st and 2nd derivatives for the 6 curves. The 2nd derivatives all vary greatly from the Quadratic as would be expected as the time scale covered is a sizeable portion of the periods involved.
Figure 4

Figure 5
Figure 6 shows the 6 curves extrapolated over 2000 years. You pays your money and takes your choice.
Figure 6
Moving Averages
Excel plots Moving Averages in a strange way when presenting engineering or scientific data. It plots the average curve right shifted whereas a more informative form would have the average values plotted a mid-point of the range used when averaging. It is very simple to produce your own averages and plot to be more useful.
Figure 7 shows the data plotted with both forms of presentation.
Figure 7
As “Ol’ Blue Eyes” said I will continue to do it “My Way”.
Curve Fitting 2
In the previous section on Curve Fitting long term (around 1000 year) periods were considered but now short term (few decades) will be discussed these possibly occurring due to decadal oscillations in climate behaviour.
With the data being used this variation will first be found by estimating how much the fitted curve varies from the actual readings. Values calculated on the fitted curve are subtracted from the data and this set of values forms a set of residuals as shown in Figure 8.
Figure 8
The target equation will have the same form as shown above but this method could be used more generally unless only a part of a cycle exists.
The 21 year moving averages were calculated as shown in Figure 9.
Figure 9
A first estimation for the equation used
CONST = 0 mm AMP = 15 mm SHIFT = 50 years PERIOD = 65 years
Figure 10 shows the data with the fitted curve.
Figure 10
The judgement criteria chosen, C, was the square root of the sum of the squares of the errors which for the starting estimation was 49.6. A convergence strategy involved recalculating the value of C using each of the parameters in turn changed by +1 and -1. The parameter change that reduced C the greatest was retained, and the process repeated until no further reduction in C occurred. This resulted in
CONST = 1 mm AMP = 13 mm SHIFT = 54 years PERIOD = 67 years
with C reducing to 43.2.
A second convergence sequence was carried out with the parameters now changed by +0.5 and -0.5 resulting in
CONST = 1 mm AMP = 13.5 mm SHIFT = 55.5 years PERIOD = 67.5 years
and a C value of 42.4. This final curve is shown in figure 11.
Figure 11
In carrying out this convergence process it was very useful to make use of “named” variables for the parameters and the incremental changes in this form.
=CONST+AMP*SIN(((SHIFT+2*A13)/PERIOD)*PI()) for the basic central value
=(CONST-DCONST)+AMP*SIN(((SHIFT+2*A13)/PERIOD)*PI())
for the incremental reduction in CONST
And so on for the remaining 7 incremental changes up to
=CONST+AMP*SIN(((SHIFT+2*A13)/(PERIOD+DPERIOD))*PI())
for the incremental increase in PERIOD.
From previous experience in using this technique in engineering the final solution is not guaranteed to be the true optimum as false minimums can be reached which the process is not capable of getting away from or the process can be stuck in a “valley”. The existence of such dips and valleys is difficult to visualize with four variables as the results would form a 4-dimensional graph. An extra check may be to start with a completely different set of starting parameters.
Another technique was to change to diagonal increments which with 4 dimensions involves 16 combinations of changes followed by alternate use of both approaches. The process stops when both processes cease changes in the value of the judgement criteria, C.
This had little effect the final values being
CONST = 1.5 mm AMP = 14 mm SHIFT = 57.5 years PERIOD = 68.5 years
and a value of C of 41.95 as shown in figure 12.
The final graph shows improvements but strangely the value of CONST=1.5mm is difficult to explain but averaged data only cover a range of 111 years which is only 1.62 periods .
Figure 12
R Squared
R Squared was a term I was not familiar with but can appear along with the trend line equation as a measure of the fit. I tried to make use of it as a measure of the accuracy between 2 sets of numbers during the previous convergence process using the function as follows
RSQ( a1:a131,b1;b131)
but it is only measuring the “the proportion of variance in the dependent variable that can be explained by the independent variable” and not the accuracy between 2 sets of numbers so I abandoned it. I.e. if the above function gave 0.8 and all b values were doubled or had 10 added the RSQ function would still give 0.8.
Googling R Squared does produce some negative remarks about it.
In conclusion it is hoped that some of the above topics may be found useful.
In a comment to one of my previous papers I was called a “Cyclomaniac”. Not sure if this was a compliment or a criticism. Interesting to see if this comment keeps getting repeated!!
References
# # # # #
Note: As usual, I have facilitated Dr. Welch in preparing this essay for publication here at WUWT. The content and ideas expressed are entirely those of Dr. Welch. I do this in the spirit that all scientific opinions deserve an airing and that you, the readers, have the right to read and comment on these opinions. As I have said previously in reference to Dr. Welch’s work, I am not a fan of curve fitting. That should not affect your view of his excellent work on sea level and the possibility that sea level is accelerating or has accelerated or your opinion on his insight in using MS Excel functions to analyze the data. He does good work. — Kip Hansen
# # # # #
via Watts Up With That?
September 3, 2024 at 08:02PM
