Wednesday, May 30, 2012

Oliver Krüger and Frederik Schenk: The Long Run

Oliver Krüger and Frederik Schenk: The Long Run

1 Introduction

Just recently, klimazwiebel released an interview with Reiner Grundmann , in which he reports about his struggles with publishing a somewhat controversial paper. The story we are about to tell and the previous interview fit neatly together.

2 The Story

Half a year ago, we planned to write a comment to Donat et al. (2011) in the Geophysical Research Letters (GRL). Because GRL changed its policies regarding comments to already published manuscripts, we prepared a small paper. In the following peer review process we realized that getting published would eventually become difficult. We received two rounds of reviews with GRL, which finally rejected our manuscript.

Later on we revised the manuscript and submitted to the Environmental Research Letters (ERL), where the manuscript was rejected after one round of reviews. After that incidence, we submitted to one open discussion journal, namely Climate of the Past (CP). CP rejected the manuscript immediately at the initial review stage. The initial review, done by one of the editors, is supposed to be a low standard to enter the open discussion, but we failed nevertheless.

In total we received seven reviews that lead to the rejection in three journals in a row. These seven reviews varied in their opinions significantly. They varied from minor comments to major comments. They were either positively or negatively minded. We also received "interesting" comments. For instance, one reviewer suggested that the average of +1 and -1 is 0.

The editor who did the initial review for CP was more open to questions regarding his rejection. He stated that our results would be plausible, but not convincing (even though he believed our results).

After these episodes, we decided to change our publication strategy. We put the manuscript on arXiv.org to make it publicly available. At the same, we submitted to Journal of Climate, from which we are expecting news whether they are willing to start the review process or not.

3 The Manuscript

The manuscript we are talking about is called "Inconsistencies between long-term trends in storminess derived from the 20CR reanalysis and observations" by Krueger, Schenk, Feser, and Weisse.
 In the letter to the editor we wrote: “In the manuscript we analyze storminess derived through a pressure-based proxy (extreme percentiles of geostrophic wind speed) in the 20th Century Reanalysis dataset 20CR over the Northeast Atlantic and compare our findings with results obtained from pressure observations.”
The method strictly follows Alexandersson et al. (1998, 2000) (their results made it into the last IPCC WG1 report as Fig. 3.41, online at http://www.ipcc.ch/publications_and_data/ar4/wg1/en/figure-3-41.html).

And continuing: “Our findings are based on a relatively simple, yet robust method for analyzing storminess over the large and well studied area of the Northeast Atlantic. The results point to a marked inconsistency between storminess in the reanalysis dataset and storminess derived from observations, which casts doubt on the use of 20CR to describe long term trends, at least in terms of storminess. We believe that changes in the number of stations assimilated into 20CR are a plausible explanation for the discrepancies.
The 20th Century Reanalysis dataset 20CR is a new climate dataset that reaches back to 1871. Because it is nearly 140 years long, scientists hope to use it for long-term trend analyses. With our work, we are assessing how realistically 20CR describes such long-term trends in terms of storminess. We chose to restrict our analyses to the Northeast Atlantic region as this region has been in the focus of several studies in the past that deal with storminess. Ideally, results obtained through 20CR and observations would agree with other, also because the pressure observations analyzed in those past studies have been very likely assimilated into 20CR. Unfortunately, as aforementioned, storminess in 20CR and observed storminess differs significantly.

The manuscript is available online at http://arxiv.org/abs/1205.5295

4 Remarks

Despite from being rejected by several journals, we are continuing our struggles, because we believe it is worth to do so. We do not know yet how many approaches it will take us. Even though our manuscript seems quite controversial, we are willing to initiate a discussion about the topic if somebody lets us.

References

Alexandersson, H., T. Schmith, K. Iden, and H. Tuomenvirta, 1998: Long-term variations of the storm climate over NW Europe. The Global Atmosphere and Ocean System, 6 (2), 97–120.

Alexandersson, H., H. Tuomenvirta, T. Schmith, and K. Iden, 2000: Trends of storms in NW Europe derived from an updated pressure data set. Climate Research, 14 (1), 71–73.

Donat, M., D. Renggli, S. Wild, L. Alexander, G. Leckebusch, and U. Ulbrich, 2011: Reanalysis suggests long-term upward trends in european storminess since 1871. Geophysical Research Letters, 38 (14), L14 703.

19 comments:

ghost said...

Ist es in KLimawissenschaften so, dass jedes Papier angenommen wird? In der Informatik ist das nicht so. Manchmal werden Papiere eben abgelehnt, manchmal sind die Begründungen besser, manchmal schlechter, manchmal sind auch einzelne Reviewer nicht neutral, manchmal sind Papiere (insbesondere meine) eben Mist. Das sind meine Erfahrungen.

Schreiben Informatiker deswegen Blogs, in dem sie die politischen Ansichten der Informatik-Community beklagen? Wüsste ich nicht.

Auch kenne ich von anderen Gebieten ähnliche Geschichten. Von seltsamen Comment-Odyssees bis hin zu "Third Reviewers" stories.

ÜBERALL wird darüber gemeckert, gewitzelt, gelacht, und man ist auch heftig frustriert. Das ist eben Peer-Review mit seinen Schwächen. Eine subjektive Meinung ist eben nur bis zu einem gewissen Grad objektiv.

ABER NUR in den Klimawissenschaften kommen Leute hervor, produzieren ein Papier, das mehr oder weniger gut ist (meist weniger), und jammern dann rum und bringen ihre Verschwörungsgeschichten hervor. Habt ihr sie eigentlich noch alle? Denkt ihr Klimawissenschaftler euer Gebiet wäre was Besonderes, es wäre irgendwie anders? Seid ihr echt so abgedreht?

Ich kann nur sagen: Peinlich!!! (mit drei Ausrufezeichen!!!)

ghost said...

Anmerkung: die wohl berüchtigste Story ist diese: http://frog.gatech.edu/Pubs/How-to-Publish-a-Scientific-Comment-in-123-Easy-Steps.pdf

Wenn ihr soweit seid, könnt ihr jammern.

Karl Kuhn said...

Lieber Ghost,

don't you have the guts to draft your insulting response in English?

Generally there is a big problem with the anonymous peer review system. I personally think it would be better to de-anonymize it.

Without a distinct polarization in the scientific discipline, unjustified rejections should be normal-distributed regarding any thinkable criteria.

But we should become alert when the impression comes up that only a certain kind of papers are rejected. This is difficult to monitor as there is no such thing as 'The Journal of Rejected Papers Relevant to Climate Change' (would perhaps be nice entertainment).

Generally, it is often papers questioning the mainstream that are rejected. On the other hand, handing in papers reurgitating whatever scientific stereotypes have good chances of getting the nod. They are easier to wave through with minimal effort by the reviewers. The result is that innovative approaches are massively discouranged, as they usually question the (sometimes life-long) achievements and convictions of the old boys who are asked for review.

If, additionally, the field is totally politicized, polarized, and riddled with groupthink, anonymous peer review no longer works as quality check.

MODERATORS: By the way ... the spam deterrent has somehow changed and is now very difficult to overcome for a normal-sighted person.

hvw said...

Finally some science here again, thanks!

ghost, your pathetic nihilism shows that your are not interested or do not understand science. Isn't it a boring life to be a little cogwheel in a machine you don't care what it does and how it works, if it works at all? Your aim producing papers, publish - checkmark, reject - oh well was probably crap again or just fate, forget and aim for next checkmark. Some people in contrast are driven by ideas and aim to produce knowledge. If the system seems to stand in the way of their work bearing fruit (knowledge and progress, not publication list entries) how can you not think about what happened exactly and how to maybe even change the system?

Whiny or not, the authors have laid the cards on the table for the wider community to check. That is laudable and the right thing to do. It remains to be seen whether the community actually looks at this and goes further. Interested people should write reviews of this, why not, for journals you do the same work for free as well.

The discussion should not begin with assumptions whether publication seems difficult in this case because of political contamination of the publishing process or not. Thankfully the authors did not make such claims and I also do not see evidence for this hypothesis. There are a number of ways how this piece could have been rejected 3 times in a row without political bias involved.

hvw said...

This article is certainly topical, well written, and its result, should it hold water, bears the potential to be extremely useful, namely to improve the brand new and longest reaching reanalysis dataset available (20CR). However, it shows that this was originally intended as a comment and I can imagine that it would need some more development to be accepted as a useful contribution in some journals.

The authors reproduce a statistic based on SLP observations originally published in 1998 and apply the same method to SLP data extracted from 20CR. Whether that statistic is meaningful and robust as an indicator for storminess stands to reason, but I did not dig into this; this question is not important here. They then observe a remarkable difference between 1881 and 1930 and conclude that this is because of the decreasing number (backwards in time) of stations assimilated in 20CR. This seems not very convincing to me at a first glance. Apparently, going back in time from 1930, the number of stations assimilated in 20CR approaches the number of stations, about 10, and apparently the very same stations, that were used in the reproduction of Alexandersson. Maybe this is too naive, but should not the derived statistics also approach each other? It is clear where to look for the cause of the discrepancy first and that is in the comparison of the obs data with the 20CR derived SLP. I would expect a direct comparison between the two, at minimum. Further, to know the actual stations used in 20CR is a must have. "Not documented"? WTF? Handwritten sheet fell under Compo's desk? I can't believe that this information is lost. The interaction of the differing time resolutions with the geostrophic approximation could be looked into. What about looking at seasonally stratified data instead of annual? .....

Lots could be done to provide at least a starting point for the inquiry about what actually happens here. Yet, nothing is provided. Instead there are some bold statements about the uselessness of 20CR, in particular in different regions of the world. What if the problem lies in a region specific artifact of the assimilation procedure? Also the authors share with many of their peers a pathological obsession to look at timeseries in terms of linear trends; but that is just my personal minority view.

verdict: publish with major revisions that include a significant extension of the Discussion, including at least one step towards solving the riddle.

Now keep in mind that the above is not even an attempt at a review, I did not invest anything near the time a reviewer should and I am not familiar with the literature. But feel free to shoot me down anyways.

ghost said...

#3 I do not see any sign here: please, write in English. The authors of the posts do speak German. I do not agree with the post and I am not convinced at all. Because the post contains only accusations without proofs and insinuations. It is an embarrassing and really low quality post. That's my opinion. That is not an insult. It's pathetic.

Well, I think, the post is a perfect example why the climate debate is so wrong at the moment. Maybe you should think about it... in computer science,one would just discuss about the problems in the peer review process (there are many, and there are many ideas... but that is not the point here, that would be rational), in climate science it is the biggest conspiracy since the communist fluoridation of the drinking water to impurify our bodily fluids.

Hans von Storch said...

Karl Kuhn, we accept all languages, as long as they are understood in Hamburg. Slang, vaguely reminding of English, ist not among these languages.

VickyS said...

Rather than de-anonymizing, I think the peer-review process should be double-blinded: as well as the authors not knowing who the reviewers are, the reviewers should not know the identity of the authors. In an ideal world, it would even be triple-blinded: the editor wouldn't know who the authors are either. This is probably not practical in a small community, though.

ghost said...

@VickyS

good idea but yes, if you are in the research field, you can recognize the authors. There are many other ideas, for example, inverse situation: authors are anonymous, reviewers are not, open reviews, awards for best reviews, no peer-review at all and new methods like social platforms where the community can decide, reviewed reviews, or the opposite direction: even more interaction between reviewers and authors meaning authors and reviewers directly discuss first, improvement phase, and then decision. Meaning more interaction then in the EGU journals, for example. But, there is also the problem of huge amounts of reviews. Often researchers cannot reviews everything, therefore Phd students or postdocs review a lot of papers. However, this is not always good. It's a problem.

Anyway, I do not believe the paper was rejected because of politcal reasons unitl it is proofed otherwise.

Karl Kuhn said...

VickyS,

with many journals double-blind review is the norm, but it does not really work ... it is almost impossible to conceal your identity as author. Moreover, I believe the problem discussed here (maybe not intended by the authors of the post) is censoring of inconventient content, not persons.

But HvW is right, the scientific argument of the authors should be the focus of this discussion, not the shortcomings of peer review.

HvS, the use of German may be accepted by you, but I still find it impolite to respond to English in German and, given the stark language chosen by ghost, not very courageous to shut out the Non-German audience when ranting.

Martin Heimann said...

I am very much astonished and dismayed about the statement of the CP editor for not accepting this manuscript into the discussion phase of the journal based on not being convinced. I am myself editor of an other EGU open access journal, ACP, and I am very familiar with the criteria to be used in accepting a manuscript for the discussion phase. The so-called access peer review has to check that elementary standards are fulfilled: the manuscript must fall into the scope of the journal, have proper acknowledgments of previous work and must be technically ready with complete references, appropriate complete tables, graphics etc. Wether the conclusions of the manuscript are convincing or not is not a criterium here - this is judged in the second stage when the manuscript is open for discussion including the formal review. Obviously, the particular CP editor does not understand this.

After reading the manuscript, I would accept it without hesitation for the discussion phase of our journal - it clearly fulfills the basic criteria. Whether it would pass the second stage and would then get the formal status of a peer-reviewed publication, however, would be up to the outcome of the discussion phase and the subsequent responses of the authors.

Freddy Schenk said...

Thx Martin for your statement. Our problem was in all cases that we did not get the possibility to have a proper review process. All issues we got would have been very easy to address as no serious points were raised. We wanted to do science, discuss and improve our analysis etc.
But maybe our analysis is too simple to be accepted so that one reviewer was not sure if we get the average of +1 and -1 correctly. We compare a new reanalysis model A (20CR) - which was never properly validated for its long-term behaviour so far (except recently for Zürich by Brönnimann et al., 2012) - with a well known storminess proxy B used since 20 years (most recently e.g. Krueger & von Storch, 2011, IPCC 2007 etc.). We show that the difference between both time series representing storminess largely increases back in time. While the proxy B undergoes no change in the data, model A does so over time. We show that the difference between model A and proxy B increases back in time with the decrease of available/assimilated stations in model A (for our domain).
The message fits on a beer coaster (which might be our problem). The last politician in Germany who wanted to have a tax declaration fitting on a beer mat… who was that poor guy?

Brönnimann, S., O. Martius, H. von Waldow, C. Welker, J. Luterbacher, G. Compo, P. Sardeshmukh, and T. Usbeck, 2012: Extreme winds at northern mid-latitudes since 1871. Meteorologische Zeitschrift, in press.

Krueger, O. and H. von Storch, 2011: Evaluation of an air pressure–based proxy for storm activity. Journal of Climate, 24 (10), 2612–2619.

hvw said...

Freddy Schenk,
While the proxy B undergoes no change in the data, model A does so over time. We show that the difference between model A and proxy B increases back in time with the decrease of available/assimilated stations in model A (for our domain).

From your figure 2 c) it appears that the # of stations assimilated into 20CR stayed constant from 1881 to 1890, then gained 2 stations between 1890 and 1900, then stayed constant again until about 1928.

Does that compare well with the divergence of the two proxies between 1881 and 1930 as shown in figure 2 a), if one assumes a causal relationship?

PS: Don't get my previous pseudo-review wrong. Blog-Bla. I really would like to see your work in print, in its current form if need be, since it is a good piece raising a good and important question.

Freddy Schenk said...

That would be a too simple relation to argue 2 station less = x% more difference. We cannot proof the station density to be the (only) cause for the difference in the long-term trend. This should be done by the 20CR community with sensitivity runs using a small but stable station density compared to the currently available “all-in” version.

Important is the difference before and after around 1940. The positive validation of 20CR for the last decades is not necessarily valid before that period with a much lower station density. In contrast, our physically based proxy should show a fairly stationary skill over the whole period i.e. as the annual 95th percentile is very robust even if the data quality would be lower. It should be noted here that 20CR uses the same stations as the triangles do.

The 20CR model shows good agreement after around 1940 but does not go up in storminess before like found in many other storminess proxies (not only shown by our proxy). One possibility from modelling experience could be that if little or no stations with SLP are assimilated into the model, strong pressure gradients present in observations will tend to be underestimated by the model. Comparing directly the model SLP with station SLP, we found that 20CR tends to have higher pressure than observed i.e. further back in time. This is surprising because it does also apply to grid points where the stations are assimilated…

We acknowledge that with our analysis many new questions pop up concerning the explanation of these noteworthy discrepancies of 20CR. These questions need to be studied and answered by 20CR and long-term trends should not be derived from 20CR so far.

hvw said...

Freddy Schenk,

thanks for your reply.

One possibility from modelling experience could be that if little or no stations with SLP are assimilated into the model, strong pressure gradients present in observations will tend to be underestimated by the model.

I find this counter-intuitive. The ensemble Kalman filter (ENKF)uses two sources of information: a) The dynamics of the prediction model, and b) the observations. If there are few stations, the initial conditions are poorly known, the ensemble covariance matrix will become close to Identity and the "Kalman step" will "weigh" the observations very high and discard a). So the final analysis where and when there are only few stations should be closer to the original observations than otherwise. But I don't really understand ENKF, so matters might be more complicated.

Comparing directly the model SLP with station SLP, we found that 20CR tends to have higher pressure than observed i.e. further back in time. This is surprising because it does also apply to grid points where the stations are assimilated

This is a bit worrying because ENKF assumes the errors to be Gaussian with mean 0. If this is really so, it could indicate a bias of the assimilation procedure. But I can't imagine that this would have escaped the QC of 20CR. Certainly stuff somebody should look into.

From Compo (2011) it appears that also marine observations from ICOADS were used. Do you know anything about whether that is the case for the region and time under consideration?

Freddy Schenk said...

If you have little or no data over a large region like the NE-Atlantic, it will be more difficult to catch strong pressure gradients. The increased storminess present in our proxy at around 1880 requires an above average accumulation of strong pressure gradients. Obviously, the 20CR is not able to "catch" them.

Our aim is however not to discuss the ENKF or speculate about the data assimilation procedures used in 20CR. We can just note that it is more than unclear.

Our study was intended to provide a first long-term validation of 20CR using a thoroughly tested proxy for storminess over a well studied region based on relatively homogeneous pressure observations.

The direct validation of station SLP with 20CR SLP is a point we could add (Oliver did already some comparison). It would however not add more relevant information in this context.

hvw said...

Freddy Schenk,

since that proxy seems to be based on a subset of the data used by 20CR, I fail to see an obvious reason why 20CR should "not be able to catch them".

A number of reasons come to mind. What if 20CR has assimilated additional marine obs? Maybe it is just more realistic. Or maybe the marine obs have a bias (ships not at see at stormy days). Maybe 20CR trades accuracy at the stations for a better representation of whole pressure field. Maybe one should look at 20CR storminess directly through the derived wind field.

The direct validation of station SLP with 20CR SLP is a point we could add (Oliver did already some comparison). It would however not add more relevant information in this context.

I think that is an important first step to figure out what is really going on. Unless this is resolved, one can't really make useful recommendations. But this here is slipping towards pure speculation and won't help figuring it out. Time to start calculating.

Many thanks for talking to me about your research anyways.

hvw said...

And good luck in getting this through, finally. Really.

Freddy Schenk said...

since that proxy seems to be based on a subset of the data used by 20CR, I fail to see an obvious reason why 20CR should "not be able to catch them".

The same SLP readings are used in both cases. But the 20CR grid point SLP deviates from the station SLP being assimilated most pronounced back in time. So does the storminess... 20CR has to find a dynamic solution based on some grid point information here. Also the assimilated monthly SSTs and sea-ice will affect SLP and the large-scale atmospheric circulation. I’m e.g. curious how 20CR gets the very limited information in the early period dynamically consistent (think of north-south temperature gradients, sea-ice extent…). Maybe only with a trade-off also in the SLP assimilated at a certain grid point?

There are many – very interesting! - “maybes” which we cannot answer at this state. This would be another independent story. But we have a very reliable proxy instead where we can be confident in its long-term behaviour.