Deeply learning global temperature patterns

A preprint has appeared on infilling of global temperature data using “deep learning”. On their tests, it performs better than the Kadow et al. method. That’s quite interesting and new methods for filling the gaps in HadCRUT are always great to see1. What’s more exciting, potentially, is that they used the same method to infill sea ice concentrations. Sea ice is important for understanding changes in global surface temperature because it has a pronounced effect on sea-surface temperatures (SST) and air temperatures at high latitudes. Current SST reconstructions – widely used for reanalsyses and for driving atmosphere-only runs of climate models – are based on sea ice fields that are reconstructed in a relatively basic way from scarce observations. New methods for doing that infilling could be make a significant advance in that area.

But, back to the temperature reconstruction…

The method falls under the general category of “inpainting” – filling the gaps in a picture or damaged photo, or, in this case, an incomplete field of temperature anomalies. The method involves deep-learning of some kind, which by my numpty understanding suggests it ought to do a better job of reconstructing realistic temperature patterns. It’s trained on CMIP data because you need complete fields to do the training.

The metrics used to evaluate the success of this procedure somewhat miss the mark. We are, of course, interested in the point wise RMSE of the reconstructions, but for assessing and understanding long-term change and other interesting variability, we also want to know how well particular metrics like the global mean temperature or Nino 3.4 SSTs are reconstructed. This is particularly the case in the early period where there are large, persistent gaps in the coverage and other datasets diverge. It’s not just how those gaps are filled for a single month that’s important, it’s how the fields then behave in those gaps. Do they show appropriate month-to-month and year-to-year correlation and variance? Are the multi-year and multi-decade changes in these regions reasonable? This new reconstruction has a colder period around the 1860s due, they say, to a cooler Antarctica. The authors correctly say that such a feature can’t be ruled out, but they could construct tests for that based on the CMIP data.

The other thing to bear in mind when evaluating these kinds of analyses is observational uncertainty. The statistical models used for reconstruction are based on climate model output which is smooth and well behaved. Observation-based datasets like HadCRUT4 (used here) don’t just have gaps. The observed areas are also prone to measurement errors with complicated structures. They are unsmooth and badly behaved. Even if we allow that the bias adjustments mop up the more intractable errors, tricky correlated errors remain and these persist in both time and space. The effect of these can be clearly seen in Figure S3 from the preprint (reproduced below).

There’s a stripe of much colder than average temperatures running North-South through the central/east Pacific. It could be that this is representative of the true temperature field, but more likely is that it’s a single ship with a badly calibrated thermometer. See how the error propagates into all of the reconstructed fields (panels b through f) not just the kriged estimate, but also the LaMa and PConv estimates. It’s also there in 20CRv3 along with some other likely artifacts.

Identifying and eliminating these kinds of errors is still challenging. HadCRUT5 (one version on from the datasets shown above) uses kriging/gaussian process regression to infill the data, but it can also make use of full error covariances that capture these kinds of errors. At the same time as filling the gaps, the method can also provide an improved analysis in areas where there are observations, but too few for a completely reliable estimate. The error covariances aren’t perfect because for some of the underlying data we don’t have ship names, IDs, or callsigns needed to build them2.

These issues mean that it can be difficult to construct good tests for how well a particular reconstruction method is doing. RMSE of individual and accumulated grid cells is part of that, but tests like removing and reconstructing large areas of persistently-missing pixels, as well as mussing up the CMIP fields to look as nasty as the real world are needed. I constructed some tests like these for the HadSST4 paper. The processes are described in the appendices. Even then, they’re more ideal than the real world. Making good surrogates and pseudoworlds is hard.

So, new method: awesome. Claims to provide an improvement over existing methods: tentatively awesome, with a pinch of salt.

  1. It’s better to use the latest version of the data set though. A few datasets weren’t included in IPCC AR6 because they had been run on HadCRUT4 rather than HadCRUT5. ↩︎
  2. There are methods for reconstructing ship tracks where this information is missing. That would help to improve reconstructions and uncertainty estimation. ↩︎


2 responses to “Deeply learning global temperature patterns”

  1. […] but there are more to draw on, such as this list of infilled versions of HadCRUT (which is missing a more recent one). There were some interesting movements towards synthesising multiple estimates of global […]

Leave a comment