I like the Hausfather paper on sea-surface temperature data sets, not so much for what it says about the slowdown in warming (I don’t think it does say much about that) but for what it has to tell us about our understanding of sea-surface temperature change in general. Although, the Hausfather paper focuses on discrepancies between data sets, it underlines a rather surprising point that we often take for granted.

I’ll get back to that in a bit. First though, a bit of background.

The global temperature data sets are, in Zeke’s now-immortal words, “smushed together”. They* smush** together sea-surface temperature measurements made aboard volunteer observing ships (usually commercial shipping recruited into the VOS programme) with measurements made by various autonomous systems, principally drifting and moored buoys. Neither of these networks was put together with a particularly strong eye to calculating a global average for climate monitoring, but there’s a lot of evidence that on average drifters are much better than ships at measuring SST.

The requirements for climate monitoring of SST are somewhat stringent. The signals we’re looking for a measured in tenths of a degree, stretched over decades of time. If you want to measure changes of a tenth of a degree in something, you need to be able to measure the something to an accuracy of hundredths of a degree and to maintain that level of accuracy over decades. That’s quite a tall order.

The surprising point that’s overlooked is this: the Hausfather paper shows that over the past twenty years, although the ship data have drifted by a few tenths of a degree relative to the drifter data – an error which is ten times larger than we’d really like – the global average SST from data sets that smush both ship and buoy data have drifted by only a few hundredths of a degree when compared to a set of more homogeneous, albeit shorter records. At a global scale, those adjustments really do seem to work and the ERSSTv4 adjustments work best of all.


With any new paper, I always like to work out how it fits with my general understanding of the situation. I see the Hausfather paper as a sort of validation exercise and with validation exercises it pays to be very clear about what exactly it is they have validated and not to make the mistake of over-generalising the conclusions.

In this case, the paper focuses on the global average SST and it only looks at the past twenty years, and within the past twenty years they only have a short period when their most independent datasets*** (Argo and ARC) overlap. Using the buoy data which is not independent, but is much more instrumentally homogeneous, ekes that overlap out to twenty years.

Based on this, we can make a judgement about which smushed data set has an estimate of global average, or – lets be pedantic – near-global average sea-surface temperature that agrees most closely with those data sets which we believe to be more homogeneous. That data set is ERSSTv4.

What we cannot do is say that ERSSTv4 is the best data set in a blanket sense. The global average SST is not the only metric that matters and the smushed data sets extend way back into the latter half of the 19th century. If you are doing anything other than looking at global SST change over the past 20 years, you probably still want to consider using the full range of available SST data sets to make sure you get a handle on the structural uncertainties.

It’s not especially surprising to me that ERSSTv4 came out best in this comparison*****. It’s the most recent SST data set they analysed and has the most up-to-date adjustments. In particular, they adjust for the specific historical changes in biases in ship measurements, something we don’t do to the same extent in HadSST3. We account for the shift between buckets and engine room measurements, but assume that engine room measurements had, on average, a constant bias. The changes in engine room biases are partly covered by the HadSST3 uncertainty analysis, which allows individual ensemble members to drift away from this average.

There was a bit of a question mark over the ERSST adjustments because they depend on air temperature measurements made largely by the same ships. The Hausfather analysis shows that we should have some confidence that, at the largest scale and in recent years, those adjustments are doing what they’re supposed to.

People who read the Huang and Karl papers closely probably join me in my lack of surprise: Karl et al. note in a blink-and-you’ll-miss it sentence in the supplementary information (it’s at the bottom of page 2) that the trend in the difference between adjusted ship data and buoy data is -0.002 degrees per decade. There was also a nice blog post around the time Karl et al came out which talks about how the satellite data since used in the Hausfather paper compare to ERSST.

So, what next

As ever, with any new analysis, it answers some questions, but creates loads of new ones (Go Science!). Here are a few…

Obviously, it would be nice to see the same kind of analysis extended to look at regional changes. That will give us a better idea of where the changes are happening, how fast, what’s causing them and so on.

Why has the bias in the ship data changed? One possibility is that the measures taken over the past 10 years to improve the quality of measurements from the VOS fleet have paid off and we now have a better, albeit smaller, VOS fleet. That needs looking into.

A mystery, which was highlighted in the Hausfather paper, is that the marine air temperature data set used to adjust the ship SSTs in ERSST also appears to be cooling relative to the IHSST data sets. The cooling of marine air temperature relative to the more homogeneous SST series is interesting and warrants further investigation, particularly if it continues to form the basis of the ERSST adjustments in the future.

Finally, there’s a community review paper which was accepted just before the new year which sets out some recommendations for how to push SST bias research forward. It’s open access and you can read it here: http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-15-00251.1

*For the record “they” includes “me”. But “they” always sounds interestingly ambiguous with sinister undertones of the other.

**Smush is a technical term, obviously, meaning to carefully combine.

***The Argo record is independent of ERSST and HadSST3 as neither data set uses the Argo data. The ARC satellite data set is very nearly independent of the in situ record**** . The two – ARC and Argo – overlap for a short period from 2005 to around 2012. The drifter data is not independent as it’s used in both ERSST and HadSST. However, separating it out from all the other in situ data means that you get a more consistent record. I particularly like that the team looked at various ways of processing and cross-checking the drifter data, even to the extent of splitting it in two and examining the agreement between the two strands. It certainly supports their general point.

****As I understand it (from reading the paper and stuff) the “degree of independence of ARC SSTs from in situ SSTs is very high, but not total.” There are second-order dependencies through the use of NWP data to perform cloud screening and also, possibly, via the analyses used to estimate the standard atmospheric profiles. NWP analyses use SST data sets which include satellite and in situ data amongst many other inputs.

***** Huang and his merry band of coauthors have been (to borrow an Americanism drawn from a sport I find incomprehensible******) “knocking it out of the park” with their work since the ERSSTv4 paper came out in early 2015.

****** I went to a baseball game once in Milwaukee. I’m sorry to say it was the most boring experience of my life. I left before the end and, as I wandered through the endless parking lot, the entire stadium seemed to lift temporarily off the ground in a seismic eruption of applause that set off car alarms in nearby streets and continued, it seemed, for several hours.