Peters Guttorp and Craigmile have written an interesting technote – “A combined estimate of global temperature” which appeared recently in my Google Scholar feed. The note is interesting and potentially very useful. [spoiler: I don’t think it realises that potential, but I think it’s still interesting].

The paper takes a set of five global temperature data sets – HadCRUT5, Berkeley Earth, GISTEMP, NOAAGlobalTemp and JMA – along with their estimated uncertainties and combines them statistically to get a single estimate of global temperature with an uncertainty range on it. A lot of people would find this a very useful product as, at the moment, one is obliged to pick one or more of the existing products and then work out what the uncertainties might mean and how to use them and then how to combine all that. Packaging all of that into a single estimate, would make a lot of people’s work with global mean temperature a lot easier.

The difficulty with this approach is that for the five chosen data sets:

  • the input data are not independent
  • the methods are not independent
  • not all the data sets are equally “mature” (for want of a better word)
  • none of the data sets is complete

It’s not clear that the assumptions made in the paper reflect this* and hence the results should be taken with a pinch of salt. Errors in data sets are assumed to be independent between data sets and independent between time steps for a particular data set. There is a “discrepancy” term which effectively mops up differences between the data sets, but it too is assumed to be independent between data sets and independent between time steps. There’s no room for explicitly modelled temporal dependence in the model except for in the slowly-varying part of actual global temperature.

1. Data dependence

All current long data sets are based on heavily overlapping subsets of data. Marine data comes from ICOADS (occasionally supplemented by national holdings and/or data gleaned from the GTS) and land data come from a set of archives, including GHCN, which are used in various permutations in these analyses.

This is a problem for combining data sets because if there are unidentified errors in these data sources (say an early warm bias in old style thermometer screens) then they will manifest in all of the results. Any method that seeks to combine them ought to be aware of these potential common errors (only, of course, no one is).

Another way in which data sets covary (and perhaps the most obvious) is that all the data sets have poor data coverage in the same places early in the record and that could lead to a common coverage bias, which is difficult to assess.

Where data sets use sea ice to determine whether an area is open water or not, the data set of choice is HadISST (of one flavour or another).

Etc.

2. Method dependence

In an ideal world, each of the five data sets would use a radically different method** and we’d span a greater part of “method space”, but in this world (which is so far from ideal it is not even spherical) there is substantial overlap.

Between the five data sets, there are three SST analyses – HadSST, ERSST and COBE-SST – all three use night-marine air temperature data to correct for the early use of buckets to collect sea water samples and two (HadSST and COBE) use the same method for adjusting bucket measurements (derived with some modifications from Folland and Parker 1995). In the pre-WWII record, therefore, all three of these data sets are liable to have correlated errors. Aside from the JMA data set, there are four data sets but only two SST analyses. ERSST is used in NOAAGlobalTemp and GISTEMP; HadSST is used in HadCRUT and Berkeley Earth. This will lead to correlated errors between data sets. I think evidence of this can be seen in the residuals plots in the paper.

Over areas of sea ice, the consensus approach has become that extrapolated land temperature anomalies are used. While this is a reasonable choice, there are very few observations available to show that it’s the right choice (or at least a good one). If it’s not, then everyone is doing the same wrong thing.

3. Maturity

The GISTEMP, NOAAGlobalTemp, Berkeley Earth and HadCRUT5 data sets have all undergone recent revamps and the latest versions of each are quasi-globally complete and have SST bias adjustments applied through the whole record.

However, the NOAAGlobalTemp data set used in this analysis isn’t the most recent version and it doesn’t include high latitude areas. There is a recent paper that does this (NOAA Interim), but it’s not cited here, so I assume it wasn’t used. The interim product is more different from the other data sets than the regular operational NOAA data set is.

JMA was, and remains, a slight oddity as there is little infilling of the land data and it suffers from the well known polar-hole bias. There is a consequent cool drift in recent years. The SST data set used (COBE-SST-1, I think) is also not fully bias adjusted – as with other older data sets, adjustments stop in 1941 – and this would also add to a recent cool bias. There’s no peer-reviewed publication describing the data set with (at least not that I know of). This is not necessarily a bad thing, but it makes it difficult to assess.

4. Completeness

None of the uncertainty estimates provided with each of the data sets is complete (the dreaded unknown unknowns rear their heads in velvet enveloping darkness). They do not capture all of the uncertainty components equally well and the temporal structure of the associated errors have large meta uncertainties (coverage, SST biases, homogenisation more generally).

Furthermore, recent research indicates that there are uncertainties in the SST data that are not currently quantified in any of the data sets (see work by Duo Chan and Peter Huybers. Are these unknown knowns***?) and these include a warm bias during WWII and relative errors in large subsets of data, which do not cancel in the global average.

5. Umm… not sure what to call this section now I started numbering things. Discussion?

Bearing all this in mind, I have reservations about the combined data set. The inclusion of the JMA data with a known cold bias in recent years is not a choice I’d make and likely leads to an underestimate of recent warming. The assumptions of independence of almost everything data-related is unrealistic and suggests to me that uncertainty will be underestimated (even with the inclusion of a discrepancy term).

I have a vague concern about estimates like this in general because they provide a simple (but wrong) alternative to a difficult problem and such shiny things do entice the punters.

There’s a question. I suppose, about whether such a thing is possible to produce in practice. I think that, at the moment, the answer is that we don’t have all the pieces that we need. Some of the problems probably could be fixed by a more thorough analysis, but not all of them. Anyone with the wherewithal to perform such a sophisticated analysis, would, I think, be better employed making a new global temperature data set [This is left as an exercise for the reader].

* I’m being terribly English: they don’t.

** Or one method would be sufficient to include all uncertainties adequately. No such method exists and I’m not entirely sure it ever could.

*** I think the Rumsfeld matrix breaks down when talking about uncertainty. Is a “known unknown” a known uncertainty or a known uncertainty that hasn’t been quantified or just a known gap in knowledge?