Although it is one of my personal principles not to read the correspondence of other people, it was kind of inescapable recently when “climategate” overwhelmed us. One of the frequently heard allegations drawn from the illegally published emails was that the original data were intentionally not posed to everybody’s free access in order to conceal the “tricks” applied to the original data to increase the amplitude of anthropogenic warming. Although I must confess that I am very much in favour of the idea of free data access for everybody we must be aware also of the dangers implied in this nice principle. And I want to argue here that some of those “tricks” are purely necessary to make data collections fit for climate analysis – the community I am part of calls these tricks “homogenizing”.
In the field of analysing climate variability, trends, oscillations and other things that we nowadays tend to simplify under the umbrella of “climate change”, we must be aware of the fact that “original climate time series” in no case contain climate information exclusively. In fact there is much random noise in them and (even worse) also systematic breaks or (the worst of all) trends or other things not representing climate but growing cities, trees, technological progress in measuring instruments, data processing, quality control mechanisms and an number of other non climatic factors.
People from universities or other research institutes usually consider climate data coming from weather services to be kind of “official” data of great quality. Working in a weather service I am glad about this and I can approve it. We spend much time and invest much money, manpower and savvy in our quality controls. But the aim is to produce data of internal and spatial physical consistence according to the current state of the respective measuring site. It is these data which are stored in the databanks, exchanged all over the globe, and published in yearbooks. It does not belong to the principal canon of the duties of weather services to have a look at the longterm stability of their data.
Therefore a free and unrestricted data policy in the field of longer climate time series of original data easily and comfortably accessible from institutions like CRU, NOAA, NASA and others opens the doors not only for serious research but also for a (planned or unintentional) misuse under the quality seal of these institutions.
I want to illustrate this with one example. I found it some years ago in the best selling book “State of Fear”. The author’s main intention is to reveal the presumed worldwide conspiracy of alarmistic NGOs to draw as much attention as possible to the case of global warming. One of his arguments was only possible through NASA’s liberal data policy. It was simply necessary for Michael Crichton to quickly download a number of obviously “original” longterm temperature series from some American Cities and some from rural sites, then selecting some urban ones with strong warming trends and some rural ones with weaker or even with cooling trends and the convincing argument “global warming is not real but an artefact of increasing urban heat islands” was ready for use and it was underpinned by “high quality original data of a trustworthy American research institution”.
In real life we can show – but only after investing the additional and painstaking work of homogenizing – that such urban or other biases can be, have to be and in fact are removed in respective high quality datasets. This is no “faking” or “tricking” but the intention to provide a data basis fit for the special application of time series analysis. Being part of a group specialised in the field of homogenization I do not want to bore the readers now with the details of our “tricks”. I only want to mention some basic findings from our experience:
- No single longterm climate time series is a priori homogeneous (free from non climatic noise)
- At average each 20 to 30 years a break is produced which significantly modifies the series
- Many but not all of these single breaks are random if the regional (global) sample is analysed - even regionally or globally averaged series contain biases of the order of the real climate signal
- There are a number of mathematical procedures which - preferably if combined with metadata information from station history files – are able to detect and remove (or at least reduce) the non climatic information
- This is much work so it should preferably be done by specialised regional groups close to he metadata – this produces the best results, is more effective and saves the time of research groups wanting to analyse the data
A number of such regional groups are active in the homogenizing business but I must also clearly state that the job is not done yet completely and globally. But we are working on it and already now I can advise everyone to use original data only for controlling the quality or the respective homogenization attempts but not for analysis itself if the goal is a timeframe of 20 years or more – a length usually necessary to gain statistically significance at the given high frequent variability of climate.
At the end I want to illustrate at one single but maybe astonishing example, how strong and how systematic a simple fact - the installation of meteorological instruments in regular weather service sites – has changed during the time of the instrumental period. The two figures display the great variability but also the average systematic trend of the height above ground of the thermometers and the rain gauges of a greater sample of longterm series in central Europe for which we were able to produce the respective metadata-series. There obviously was a change in the measuring philosophy from “preferably remote from surrounding obstacles” (on measuring platforms, towers, rooftops) to “near to the ground”.
A research group using the “original data” would have had no chance to invest the time to go into these details. Such original data would have produced a significant “early instrumental bias” of too cold maximum temperatures, too warm minimum temperatures and too dry precipitation totals. The former being at the order of 0.5°C each and thus reducing the MDR as strong as 1°C in some cases, the latter producing a precipitation deficit near 10%.
I hope my plea for “tricking” is not misunderstood but regarded as what it is – an attempt to see things more differentiated and sophisticated. A completely liberal data policy may seem to be the only acceptable and achievable alternative at first sight. But not each modification of the original data has the intention to “hide the truth” – at the contrary, the overwhelming majority of such attempts want to help to effectively unveil the truth.