McShane and Wyner are two statisticians that aim to analyze the statistical methods used in proxy-based climate reconstruction, but mostly focusing on the reconstructions by Mann and collaborators. Summarizing, this paper has two parts: in the first one, the authors present a critic of what the think are the reconstruction method used by 'climatologist'. This part is very weak. It seems that their knowledge of the papers by Mann et al is only indirect - from what they may have read in blogs - and that they actually did not read the papers themselves. They test and criticize a statistical method which, to my knowledge, has not been used for climate reconstructions, and in contrasts they barely mention the methods that are indeed used. If they had analyzed the latter methods, climatologist would have benefited much more strongly from this study. The second part, in which they focus on the estimation of uncertainties, is somewhat better. They claim that the uncertainties are much larger than those included in the 'hockey stick reconstruction' and that the shaft of the hockey stick is rather an artifact of the method. These conclusions are, however, hardly new. The flatness of the shaft is actually only defended by Mann et al. and more recently in a much weaker fashion then 10 years ago.
The introduction already contains a terrible and unnecessary paragraph, full of errors:
For example, Antarctic ice cores contain ancient bubbles of air which can be dated quite accurately. The temperature of that air can be approximated by measuring the ratio of ions and isotopes of oxygen and hydrogen.
Well, past temperatures are reconstructed from the isotope ratio of water molecules in the ice, and not from air in the air bubbles. The air bubbles themselves cannot be dated accurately, since air can flow freely in the upper 50 or meters of firn, and the bubbles are only sealed when ice is finally formed. Thus the time resolution of the age of the air bubbles is rather 70 years or so, depending on the site. The isotope ratio in the trapped air, for instance oxygen 18, is only a very indirect measure of global temperature and rather reflects the size of the biosphere through the fractionation that occurs in photosynthesis (Dole effect). It is not even a proxy for local temperature. Furthermore, the temperature of the air bubbles and of the ice layers is continuously changing, driven by the heat flow from the surface and from the rock. I am still wondering which 'isotopes of hydrogen' can be analyzed in trapped air (did they mean the hydrogen in the molecules of water vapor in the bubbles ?). The authors probably confused here the analysis of the past CO2 concentration in trapped air bubbles with the estimation of past temperate from the isotope ratio in ice.
This error is not relevant for the paper itself, and this paragraph is unnecessary, but it does tell me a few things: the authors did not consult with any climatologist; they feel confident enough to write about things of which their knowledge is very superficial; the editors did not find necessary the manuscript to be reviewed by someone with some knowledge about proxies.
Further misunderstandings, this time about climate models:
one such source of evidence. The principal sources of evidence for the detection of global warming and in particular the attribution of it to anthropogenic factors come from basic science as well as General Circulation Models (GCMs) that have been ﬁt to data accumulated during the instrumental period (IPCC, 2007).
Although climate models contain parameters that may be tuned, climate models are not really fit to observations. If that were the case, the models would all reproduce perfectly the observed global trend. We all know this is not the case, and that the spread is quite large.
Summarizing the previous work of McIntyre and McKritik on the kockey-stick they write:
M&M observed that the original Mann et al. (1998) study (i) used only one principal component of the proxy record and (ii) calculated the principal components in a ”skew”-centered fashion such that they were centered by the mean of the proxy data over the instrumental period (instead of the more standard technique of centering by the mean of the entire data record)This paragraph, and later other similar paragraphs, tells me that the authors have not really read the original paper by Mann, Bradly and Hughes (1998). MBH never used 'only one principal component of the proxy record'. The authors, again, are probably confused by what they may have read in blogs. MBH did calculate the principal components of some regional subsets of proxy records, in areas where the density of sites was very high, for instance, tree-ring sites in the US Southwest. This was done as a way to come up with a regional index series representative of that area, instead of using all series from a relatively small area and thus over-representing this area in the global network. The issue of the un-centered calculation of principal components is already quite clear (the way in which MBH conducted the analysis is not correct). But other that, the MBH reconstruction is not based on 'principal components of the proxy record'. It is based on the principal components of the observed temperature field. For the millennial reconstruction, MBH estimate that only one PC of the instrumental temperatures could be reconstructed. They never used 'only one principal component of the proxy record', they did use only the first principal components of the US Southwest tree-ring network, but not of the 'proxy record'. For instance, for the first part of the millennial reconstruction 1000-1400, MBH used an inverse regression method with 12 proxy indicators and one principal component of the temperature field. This point is so clear in the MBH paper that it really shows that McShane and Wyner actually did not read MBH98.
Further down in the paper, the authors go into the problems represented by the large number of proxy records and the short period available for calibration of the statistical models, 1850-1998. Again, they claim the Mann et al used principal components to reduce the dimensionality of the covariates:
to achieve this. As mentioned above, early studies (Mann et al., 1998, 1999) used principal components analysis for this purpose. Alternatively, the number of proxies can be lowered through a threshold screening process (Mannet al., 2008)
Again, wrong. Correctly or incorrectly, this is not what MBH did. Although, the number of proxy indicators is regionalized in areas with high proxy density through a regional principal components analysis, the way MBH deal with the risk of overfitting is by using inverse regression. This means that in their statistical model they write the proxy vector (around 100 proxies) as a linear function of the principal components of the temperature (about 8 principal components). This problem is always well-posed. MBH never conducted a principal components analysis of the global proxy network.
Section 3.2 is the core of the first part of the paper. In this section, the authors seem to propose a linear regression statistical model to reconstruct past temperatures in which the predictand is the northern hemisphere annual temperature and the predictors are the available proxy records (they use 1209 proxies). Then they argue that this model leads to overfitting and that the number of predictors (proxies) has to be restricted. They propose the Lasso method to screen the set of proxies. They compare temperature hindcasts of the last 30 years obtained using real proxy record with that obtained with simpler benchmarks: imputing just the mean of the calibration period, or assuming an autoregressive process for the mean temperature extrapolating forward, or finally, by constructing synthetic proxy records mimicking some statistical characteristics of the real proxy records. They found that the using the real proxy records does not produce a significant improvement of the hindcast than when using synthetic proxies.
Well, this result may be interesting and probably correct, but I doubt it is useful, since I am not aware of any reconstruction using this statistical regression model. The reconstructions methods for the Northern Hemisphere mean temperature that I am aware of are:
-CPS, in which only one free parameter is calibrated (the variance ratio between an all-proxy-mean index and the instrumental temperature.
-MBH, which as indicated before is an inverse multivariate regression method based on the principal components of the temperature field
-RegEM, an iterative method originally employed to fill-in data gaps in incomplete data sets
-Principal components regression (actually this method has only be used to reconstruct regional temperatures and not hemispheric means), a multivariate direct regression method in which the principal components of the target variable is written as a linear function of the proxy records.
- BARCAST, a Bayesian method to reconstruct the temperature field, later mentioned in this paper.
The closest situation to the model proposed by McShane and Wyner would be the regression at the core of the RegEM method. To regularize this regression, which indeed would consist of too many predictors, several versions of the RegEM method have been proposed (truncated total least squares, ridge regression). McShane and Wyner just mention this in passing. So I am surprised that McShane and Wyner just test and analyze a method that it is not actually used. A really useful contribution of this type of work would have been to analyse those methods that are actually used, which admittedly still present problems and uncertainties. For instance, the same test they propose for the Lasso method to reconstruct the Northern Hemisphere average could have been applied to the RegEM to reconstruct the temperate field. This would have been something interesting. Other potential problems of the RegEM method are only briefly mentioned. For instance, the RegEM requires that the missing data (the temperature to be reconstructed) be distributed at random within the data set. In the set-up of climate reconstructions this is clearly not the case, as temperature values to be reconstructed are clustered in one end of the data set.
The authors unfortunately do not go into a deeper analysis. Questions of proxy selection, underestimation of past variability (the failure of their method to reproduce the trend in the last 30 years could be perfectly due to this problem as well), the role of non-climate noise in the proxies, and finally the tendency of almost all methods to produce spurious hockey sticks, all of them are related to some degree. For instance, the presence of noise in the proxy records alone could, regardless of the statistical method used, lead to underestimation of past variations. A method based on some RMSE minimization would tend to produce reconstructions that revert to the long-term mean whenever the information of the proxies tends to be mutually incompatible, and would only produce the right amplitude of past variations if all proxies 'agree' . Two schools of thoughts depart here. One, represented for instance by Mann, attempts to use all proxy records available and design a statistical method that some how can extract the signal from the noise. The paper by McShane and Wyner also fits within this school, trying to apply the Lasso method for proxy selection. It fails, according to the authors, but this may be due to characteristics of the Lasso method that render it inadequate or perhaps due to the impossibility of designing a statistical method that can successfully screen the proxy data. It is not clear what the real reason is.
Other idea, portrayed in our recent paper argues that another way forward is to select good proxies, for which there is a priori a very good mechanistic understanding of the relationship between proxy and temperature. Here, the observed correlation between proxy and temperature is secondary and proxies with good correlations would be rejected if the mechanistic understanding is absent or dubious. Once a good set of proxies is selected, with minimal amounts of noise, any method should be able to provide good reconstructions.
The last part of the paper is related to the estimation of uncertainties, mostly by setting up a Bayesian method. As far as I could understand, their method is a simplified version of what Tingley and Huybers or Li et al have already put forward. The main difference, they note, is that these latter authors have only conducted methodological tests with pseudo-proxies, and not produced actual reconstructions. This is the part I most agree with, but their conclusions are hardly revolutionary. Already the NRC assessment on millennial reconstructions and other later papers indicate that the uncertainties are much larger than those included in the hockey stick and that the underestimation of past variability is ubiquitous.
Almost at the end of the paper they include a paragraph that either has been misunderstood by the authors themselves or by the in the blogosphere:
Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been over-conﬁdent in their models
It is not clear to which models the authors are referring to. If they mean 'statistical models' to reconstruct past temperature, I would agree with them. If they mean 'climate models', they are again dead wrong, since climate models and climate reconstructions are so far completely separate entities: climate models are not tuned to proxy-based reconstructions, and proxy-based reconstructions do not use any climate model output.
In summary, admittedly climate scientist have produced in the past bad papers for not consulting professional statisticians. The McShane and Wyner paper is an example of the reverse situation. What we need is an open and honest collaboration between both groups.