Very exciting newish global temperature dataset

A new version of NOAAGlobalTemp is out. They have finally cast aside1 the EOTs that have been filling the gaps in their dataset since the dawn of time and have embraced artificial neural networks to do the job instead. The paper they reference is from 2022, so it’s taken a while to materialise2: “Improvements to the Land Surface Air Temperature Reconstruction in NOAAGlobalTemp: An Artificial Neural Network Approach“. The paper only describes the land and sea ice component of the dataset, so it’s not entirely clear if they did anything different for the oceans. The press release, and a flow chart bundled with the documentation suggests it is still ERSSTv5, in which case there’s no neural innovation for the SSTs3. Still, this is thrilling: a new global temperature dataset.

The base of the land/ice analysis is the same as in the previous iteration. A low-frequency, low-resolution analysis is performed that catches most of the large-scale, long-term change. It’s still distinctly old school, as are the bias adjustments. That all stays the same too. Because of that, the new global series is very similar to the old global series. Where the new stuff starts is in the analysis of the high-frequency residuals you get when you subtract the low-frequency analysis from the gridded observations. Previously, these residuals were reconstructed using EOTs (which I think of as spatially limited EOFs) where there were sufficient data to constrain the EOT. Now, the residuals are reconstructed using a neural network. This makes a progressively larger difference the further back in time one goes.

For each month in the dataset, they create a neural network with as many input neurons as there are observed data points (<=1507) and as many output neurons as there are data points in the reconstruction (1507). A third, hidden layer (with 200 neurons) sandwiched between the two is where all the magic happens. As near as I can understand, a separate network is trained for every single month (of which there are now over 2000), because each month has a unique (or near enough) configuration of input grid cells. That’s a lot of training: 540 hours as opposed to 2 minutes for EOTs.

The networks are trained using ERA5 data for the target month (July say) and a month either side (June-August) using data from 1950 to 2019. The input is ERA5 reduced to observational coverage. The output is ERA5 across the whole reconstruction area. The ERA5 data is further split into training (190 months) and validation (20 months) sets. How exactly ERA5 is made to look like the land data isn’t said. There are potentially interesting details there regarding the calculation of the low-frequency field. The neural network reconstruction has higher correlation, lower RMS differences and less bias than the EOT reconstructions. There’s still a s step change in southern hemisphere reconstruction skill prior to 1958 which is when systematic observation of the Antarctic commenced.

The neural network also does a better job of reconstructing realistic variance. The EOT approach always suffered from a loss of variance because EOTs were only used if there were sufficient data to constrain them. So, in areas with few data, there was no high-frequency analysis. The neural network approach still loses variance eventually, but it’s consistently higher and reasonably constant between 60S and 60N. Perhaps at lower observation density, it optimises for a smoother field to keep the RMS error low, but I’m guessing.

All of which suggests that the new reconstruction is a step up on the old one. There’s not a lot to be done about the high latitude southern hemisphere pre 1958, but in the examples given, the reconstruction looks more realistic over Antarctica at least to the unaided eyeball. Whether it’s the right realistic is another matter but hard to assess.

There’s no uncertainty analysis. That’s promised for a later paper. This is a typical pattern for NOAA surface temperature datasets. Update a base dataset, then update the blended data set, then update the uncertainty analysis. Rinse and repeat. It’s a nice approach for writing papers – neater to get them out one at a time – but when it comes to data provision, the best estimate analysis is often left without an uncertainty estimate or out of sync in some other way. In this case, the uncertainty columns in the v6 files are currently empty4 and there’s no description paper5 for the v6 blend of land and SST. The ensemble approach used to estimate uncertainties in previous versions is only intermittently updated. Whether this bothers you, rather depends on what you’re using the data for. A lack of updated uncertainty for the global means is mildly annoying for applications like the update of IPCC indicators and can be worked around. For more detailed work the lack of an updating ensemble is a blocker and a real shame because the ensemble approach (at least in the last iteration) was nicely diverse.

The method employed here is very different to Kadow et al. and Bochow et al. who also used a neural network for infilling. Kadow et al. used an in-painting method based on a network for filling gaps in damaged photographs6. Structurally speaking it’s quite a different creature from the method employed here, which is great. In both cases, I wonder how the network reacts to the noise one gets with real observations that isn’t present in ERA5 (or the twentieth century reanalysis and model-based training datasets used by Kadow et al.). As the network gets sparser – further back in time, and towards the edges of the observed world – the number of stations going into any gridcell average decreases. This naturally increases the variance, but also makes it harder to quality control and homogenise the data, which also tends to increase the variance. There’s also the concern that for some months – between 1950 and 2019 – the target data might be in the training data. The way that the low-frequency field is handled for the ERA5 data is also of interest. It’s not described in the paper and residuals from a period in the 1850s are unlikely to have the same structure as residuals from a period in the 2010s. On the other hand, by subtracting off the low-frequency field, the neural network gets to do only the easy bit (if you like) or a simpler problem than doing everything (if you prefer).

Anyway. A fascinating new dataset to download and play with…

-fin-

  1. Well, almost ↩︎
  2. Intriguingly, it appeared in print before it appeared online, which is weird. I guess they gotta throw the print crowd a bone every now and then. ↩︎
  3. Though I understand that a new version of ERSST is also being developed. As much as I hate conferences, I miss a lot by not going… ↩︎
  4. As they were in v5.1. ↩︎
  5. And only limited documentation that I could find. ↩︎
  6. Damaged photographs have a very different geometry to poorly observed planets, one being a rectangle, the other a oblate spheroid. By construction the Kadow method doesn’t know that temperatures should be continuous at the date line or that the top and bottom edges of the photograph represent only two locations. It sort of picks this up during learning, but plotting the data with different base longitudes is revealing. ↩︎


Leave a comment