The earth is a sphere

Peters Guttorp and Craigmile have written an interesting technote – “A combined estimate of global temperature” which appeared recently in my Google Scholar feed. The note is interesting and potentially very useful. [spoiler: I don’t think it realises that potential, but I think it’s still interesting].

The paper takes a set of five global temperature data sets – HadCRUT5, Berkeley Earth, GISTEMP, NOAAGlobalTemp and JMA – along with their estimated uncertainties and combines them statistically to get a single estimate of global temperature with an uncertainty range on it. A lot of people would find this a very useful product as, at the moment, one is obliged to pick one or more of the existing products and then work out what the uncertainties might mean and how to use them and then how to combine all that. Packaging all of that into a single estimate, would make a lot of people’s work with global mean temperature a lot easier.

The difficulty with this approach is that for the five chosen data sets:

  • the input data are not independent
  • the methods are not independent
  • not all the data sets are equally “mature” (for want of a better word)
  • none of the data sets is complete

It’s not clear that the assumptions made in the paper reflect this* and hence the results should be taken with a pinch of salt. Errors in data sets are assumed to be independent between data sets and independent between time steps for a particular data set. There is a “discrepancy” term which effectively mops up differences between the data sets, but it too is assumed to be independent between data sets and independent between time steps. There’s no room for explicitly modelled temporal dependence in the model except for in the slowly-varying part of actual global temperature.

1. Data dependence

All current long data sets are based on heavily overlapping subsets of data. Marine data comes from ICOADS (occasionally supplemented by national holdings and/or data gleaned from the GTS) and land data come from a set of archives, including GHCN, which are used in various permutations in these analyses.

This is a problem for combining data sets because if there are unidentified errors in these data sources (say an early warm bias in old style thermometer screens) then they will manifest in all of the results. Any method that seeks to combine them ought to be aware of these potential common errors (only, of course, no one is).

Another way in which data sets covary (and perhaps the most obvious) is that all the data sets have poor data coverage in the same places early in the record and that could lead to a common coverage bias, which is difficult to assess.

Where data sets use sea ice to determine whether an area is open water or not, the data set of choice is HadISST (of one flavour or another).


2. Method dependence

In an ideal world, each of the five data sets would use a radically different method** and we’d span a greater part of “method space”, but in this world (which is so far from ideal it is not even spherical) there is substantial overlap.

Between the five data sets, there are three SST analyses – HadSST, ERSST and COBE-SST – all three use night-marine air temperature data to correct for the early use of buckets to collect sea water samples and two (HadSST and COBE) use the same method for adjusting bucket measurements (derived with some modifications from Folland and Parker 1995). In the pre-WWII record, therefore, all three of these data sets are liable to have correlated errors. Aside from the JMA data set, there are four data sets but only two SST analyses. ERSST is used in NOAAGlobalTemp and GISTEMP; HadSST is used in HadCRUT and Berkeley Earth. This will lead to correlated errors between data sets. I think evidence of this can be seen in the residuals plots in the paper.

Over areas of sea ice, the consensus approach has become that extrapolated land temperature anomalies are used. While this is a reasonable choice, there are very few observations available to show that it’s the right choice (or at least a good one). If it’s not, then everyone is doing the same wrong thing.

3. Maturity

The GISTEMP, NOAAGlobalTemp, Berkeley Earth and HadCRUT5 data sets have all undergone recent revamps and the latest versions of each are quasi-globally complete and have SST bias adjustments applied through the whole record.

However, the NOAAGlobalTemp data set used in this analysis isn’t the most recent version and it doesn’t include high latitude areas. There is a recent paper that does this (NOAA Interim), but it’s not cited here, so I assume it wasn’t used. The interim product is more different from the other data sets than the regular operational NOAA data set is.

JMA was, and remains, a slight oddity as there is little infilling of the land data and it suffers from the well known polar-hole bias. There is a consequent cool drift in recent years. The SST data set used (COBE-SST-1, I think) is also not fully bias adjusted – as with other older data sets, adjustments stop in 1941 – and this would also add to a recent cool bias. There’s no peer-reviewed publication describing the data set with (at least not that I know of). This is not necessarily a bad thing, but it makes it difficult to assess.

4. Completeness

None of the uncertainty estimates provided with each of the data sets is complete (the dreaded unknown unknowns rear their heads in velvet enveloping darkness). They do not capture all of the uncertainty components equally well and the temporal structure of the associated errors have large meta uncertainties (coverage, SST biases, homogenisation more generally).

Furthermore, recent research indicates that there are uncertainties in the SST data that are not currently quantified in any of the data sets (see work by Duo Chan and Peter Huybers. Are these unknown knowns***?) and these include a warm bias during WWII and relative errors in large subsets of data, which do not cancel in the global average.

5. Umm… not sure what to call this section now I started numbering things. Discussion?

Bearing all this in mind, I have reservations about the combined data set. The inclusion of the JMA data with a known cold bias in recent years is not a choice I’d make and likely leads to an underestimate of recent warming. The assumptions of independence of almost everything data-related is unrealistic and suggests to me that uncertainty will be underestimated (even with the inclusion of a discrepancy term).

I have a vague concern about estimates like this in general because they provide a simple (but wrong) alternative to a difficult problem and such shiny things do entice the punters.

There’s a question. I suppose, about whether such a thing is possible to produce in practice. I think that, at the moment, the answer is that we don’t have all the pieces that we need. Some of the problems probably could be fixed by a more thorough analysis, but not all of them. Anyone with the wherewithal to perform such a sophisticated analysis, would, I think, be better employed making a new global temperature data set [This is left as an exercise for the reader].

* I’m being terribly English: they don’t.

** Or one method would be sufficient to include all uncertainties adequately. No such method exists and I’m not entirely sure it ever could.

*** I think the Rumsfeld matrix breaks down when talking about uncertainty. Is a “known unknown” a known uncertainty or a known uncertainty that hasn’t been quantified or just a known gap in knowledge?

Was 2020 the warmest year on record?

In which I continue an occasional series (2014, 2015) on whether particular years are the warmest on record.

A question I have been asked many time since about March 2020 was whether 2020 was going to be the warmest year on record and then, once the new year was rung in, whether 2020 was the warmest year on record? In a sense, the answer – the correct answer I would aver – hasn’t changed since March: it was maybe then, and it is maybe now. Some years are too close to call and 2020 was one such.

While 2020 was still underway, that indeterminate answer was found to be acceptable. The year was not over and the remainder of months yet to be experienced impressed upon the asker the unavoidable existence of some small doubt. In late November, with ten months in hand, 2020 looked set to land somewhere amongst the three warmest years. Although some interpreted this as a euphemistic admission that 2020 was merely the third warmest year, others rightly apprehended it to mean that the final answer was not yet decided.

November and December often hold surprises for the careful watcher of global temperature and 2020 had surprises indeed, furnishing first a November whose warmth was almost without equal in our short records and then a December that showed the first cold bite of La Niña. In short, 2020 kept us guessing to the very end.

In January, the situation stood thus. The nominal rankings – those that take the central estimate as precise and true – of 2020 were: 1st (GISTEMP), 2nd (NOAAGlobalTemp), 2nd (HadCRUT5), 2nd (ERA5), 2nd (Berkeley Earth), 2nd (HadCRUT4), 3rd (JRA-55). A data set produced by JMA from in situ measurements also had it in first place. But we know that global temperatures are not precise and though, by January, we had data for every month of 2020, we did not have data everywhere. Data gaps in space have an analogous effect to temporal gaps impeding our ability to state with ultimate precision what the temperature of the year is and how it sits in relation to its near-neighbours.

The spread between data sets is one indication of this and includes in it the effects of other factors deleterious to our ability to provide for the curious to their exact satisfaction. No measurement is perfect and the margins of error attendant on each one, though diluted by averaging across so many thousands of independent recordings, still serve to perturb the resulting global average from its true value. What that “true” value is is also a matter of small debate. Each centre that exerts itself in calculating a global mean temperature, provides as a result – intended or otherwise – an answer to a problem that differs in important matters to that of its peers. Whether one uses sea-surface temperatures or air temperatures measured just above the marine surface, affects the final answer, as do myriad decisions that must be made practically in the construction of a data set. Even if we could measure the individual data points precisely, our failure of definition, which stems perhaps from a more fundamental inability to specify precisely what global temperature is, would still lead to a range of answers. Though as for that, the divergence between these estimates in recent years is very small.

Relying simply on the spread of answers available from these global centres, we cannot say with certainty whether 2020 was the first, second or third warmest. The data sets, albeit alike in their claim to authority, disagree. However, none contradicts the statement that 2020 was one of the three warmest years. There is an advantage, I would argue, to a statement that cannot be refuted by available evidence, and that its advantages exceed those of a statement, which though having the attraction of boldness and apparent certitude, is nonetheless susceptible to counter examples.

This is not to say, that the causes for the differences between data sets should not be explored. I believe strongly that they should. By better understanding the reasons for them, we might succeed in bringing our ideas into sharper focus. To this worthy aim, I append two caveats. First, we should not seek simply to reduce disagreement; we should seek out and understand the causes for these disagreements and thereby enact improvements across the full range of data sets. Second, that there may be necessary limits on how far it is practical, if not possible, to pursue this aim, with improvements being harder to find, the closer the agreement becomes.

When considering how to rank the year, we might also consider the estimates of uncertainty provided with some of these data sets. GISTEMP, NOAAGlobalTemp, HadCRUT4 and HadCRUT5 and Berkeley Earth routinely generate such information and it figures in some of the statements made by these groups. GISTEMP, which finds 2020 ahead of 2016 by a margin smaller than the estimated uncertainty, noted that 2020 was the joint warmest year on record. Copernicus, relying on their ERA5 reanalysis, likewise noted that 2020 was joint warmest on record though they do not routinely publish an estimated uncertainty. Berkeley Earth note that despite its nominal ranking of second place, 2020 was not so different from 2016 that their positions might not in reality be exchanged. This they summarised by saying that “2020 and 2016 could also be regarded as effectively tied“.

Later on though, Berkeley Earth say “As a result, 2020 is nominally the second warmest year to have been directly observed, though the three warmest years are all tightly clustered together relative to the uncertainty in these measurements.” Indeed, 2020 sits almost exactly half way between 2016 and 2019 in their analysis. 2020 is in the deliciously poised position of being effectively tied with the nominally warmest and third warmest years. Although it is very unlikely that 2019 was warmer than 2016, 2020 could flip places with either one if the years are jiggled round within their uncertainty bounds. Therefore, it is not quite correct to say that 2016, 2020 and 2019 are statistically indistinguishable from each other as 2016 and 2019 can be told apart. A similar calculation made using the HadCRUT5 ensemble finds that while 2020 is most likely second warmest on record, there is a fair chance of it being first or third warmest. The precise probabilities of these outcomes are themselves uncertain and vary from data set to data set.

By now, it ought to be clear that any attempt to talk specifically about rankings of a year like 2020 are doomed to confusion. If one person says something that sounds definitive, it will seem to contradict what another person has said with equal conviction and the unpicking this necessitates* distracts from the essential message, which is this: it’s no surprise that 2020 was an exceptionally warm year, warmer for sure than 168 years in the global temperature record and very likely much longer than that. The decade on decade rise in temperature is unmistakable. The decade 2011-2020 was clearly warmer than 2001-2010, which was warmer than 1991-2000, which was warmer than 1981-1990 and indeed any decade prior to that. Global temperatures have clearly increased and, to borrow the words of the IPCC, “most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations”. We see physically consistent effects in other indicators like ocean heat, sea level, glacier and ice-sheet mass balance, humidity and many others.

* For a perfect example see this Guardian article which pedals back from its headline from the very first sentence.

Urban heat islands can explain anything

Scafetta 2021 was published in Climate Dynamics and it’s the kind of publishing decision that makes you rethink the reputation of a journal. The author thanks three reviewers for their useful comments but the paper reads as if those comments were not heeded. A quick read through is enough to identify several obvious weaknesses that should have been addressed before publication*.

Here I pause to note that a flaw in the methods does not mean the conclusions are wrong; it indicates merely that the conclusions do not have the support the author believes they do. Also, if an important detail of the method is missing, it means we don’t know for sure what was done, not that what was done was incorrect.

So, where to start? A list of notes of the sort that I’d write into a review:

  1. The data set used in the analysis, CRU TS, is not designed for this kind of analysis. To quote (extensively, the relevant caveats being extensive) from the text of the paper describing CRU TS: “Because of the overriding objective to present complete coverage of land surfaces (excluding Antarctica) from 1901 onwards, CRU TS is not necessarily an appropriate tool for assessing or monitoring global and regional climate change trends. Nevertheless, with care taken to identify and avoid trend artefacts caused by changing data coverage or data inhomogeneities, then CRU TS can be used for global and regional trend analysis. The first issue is that unlike, for example, CRUTEM, regions uninformed by observations are not left missing but instead are replaced by the published climatology. This has the advantage of being a known entity, rather than an estimate, but has the unavoidable side effect of decreasing variance. Additionally, the numbers and locations of stations contributing to any grid cell will change over time. Both effects can potentially give rise to trend artefacts. ” The description of CRU TS given in the paper do not give a clear indication of how these issues were addressed if they were addressed at all. The remedy here would be to describe how these issues were dealt with in more detail, if they were, or reprocess the data to address these issues if they weren’t. In addition, another max-min data set could be used or a reanalysis or both.
  2. Scafetta uses the ensemble mean of the climate models. Why? “[T]he ensemble mean rather than the envelope of the full range of models or individual model runs is used because the latter appear too random at the required decadal scale.” It’s hard to know what to make of this statement. If individual model runs deviate from the ensemble mean, which of course they will because the ensemble means is not a physically realisable state of the system, then that is telling us something about how much we would expect the real world to deviate from the ensemble mean or else telling us something about the reliability of individual models. Either way, it is not “random” and it is of interest for the analysis. If we allow that the “envelope of the full range of models” is telling us something about expected deviations, then the deviations that the real world exhibits from the ensemble mean can be compared to that. If they fall within the envelope, then there may be nothing to explain. If they fall without, there may be something to explain. Scafetta ignores the envelope entirely, so there is no way to assess whether the differences seen in Figures 7 or 9 are meaningful or not. A way forward here would be to use the ensemble spread to define an expected range of variability for single model runs around the ensemble mean at the decadal scale, which could then be used to assess the observed deviations.
  3. After glossing over caveats in the observational data, and disregarding model information that would help to judge the importance of the deviations between models and observations, Scafetta then moves on to assessing the differences between the two. In a section entitled “Visual evidence for UHI biases in macro‐regions“, Scafetta overlays the locations of larger cities on a map of the change in DTR between two periods. The general thesis is that urban areas have seen the greatest drop in DTR, but a quick perusal of the map shows little correlation with the city locations. When I say this, I do not necessarily contradict the author who helpfully points out large areas where the data have changed contrary to the proposed general pattern. These are pointed out only so that they can be explained away – “The bluish area of Bolivia appears to be due to the deforestation of the area” for example (although no evidence is provided for this statement), or “Also, the blue desert regions in Algeria, east Ethiopia, and south-west Arabia seem influenced by the presence of some cities.” However, statements like “In general, the salmon–yellow–green areas of Fig. 9 do not present large urban agglomerates.” are clearly wrong as there are obvious counter examples: Argentina and large areas of central Europe, among others. The problem here is that (previous weaknesses aside) there is no attempt made to predict what the deviations should be. It’s clear from the description (and the lengthy introduction) that Scafetta expects various influences – chiefly UHI – to lead to differences between the models and observations, but without a means of predicting what those differences are, the “visual evidence” is unconvincing and amounts to little more than handwaving and post hoc explanation. This section reads like an initial exploratory analysis. Combining with point 2 above, it’s clear that at this point the reader neither knows what the null hypothesis looks like, nor what the proposed hypothesis looks like so there is no means to differentiate between the two.
  4. The next section to look at is – “A tentative estimate of the non-climatic land and global warming bias” – in which the data set analysed switches to HadCRUT4. HadCRUT4 is a different data sets from CRU TS as is noted in the CRU TS paper so the extent to which the previous (flawed) analysis of CRU TS is actually informative about HadCRUT is unknown. That aside… Scafetta assumes that the ratio of land-to-ocean warming is right in the models, and that the rate of ocean warming is correct in the observations and uses these two factors to estimate a “correction” for the observed rate of land warming**. One thing to highlight is that Scafetta refers to HadCRUT-land and HadCRUT-sea data sets but never says how these are calculated nor how their model equivalents are calculated. Is the Arctic Ocean part of the “sea” and if so, how was that dealt with? How was missing data dealt with in the comparison? Were the areas covered by the models and observations matched? These are all important unanswered questions for understanding the obs-model differences. One assumption made is that observed SSTs are unbiased from 1940 onwards. In fact, HadSST3 under-estimates the rate of change from the 1940s to present because it assumes an essentially unchanging bias for ships (when the ship bias actually declined from the 1960s to the 2000s) and overestimates the fraction of bucket measurements from the late 1940s onwards. Both these problems were addressed in HadSST4 and earlier papers, which were available long before Scafetta’s manuscript was even submitted. Alternative data sets were available. e.g. NOAAGlobalTemp or GISTEMP which use ERSST. Even so, uncertainties generally remain larger on the SST side of things than the land side.
  5. One thing that’s not mentioned in the paper is the fact that reanalyses such as the ERA family and JRA-55 find very similar trends of land temperatures to data sets like CRUTEM or HadCRUT. These reanalyses assimilate vast numbers of observations from satellites as well as in situ sources. In addition there are long-running reanalyses – such as the 20th century reanalysis and ERA-20C – that use only SST and surface pressure observations. These are independent of the temperature measurements from land surface stations, but yield very similar trends.
  6. A recurring problem with simplistic analyses of UHI is that they substitute a difficult question such as “how has urbanisation changed in an area and what effect does that have on estimates of temperature change over wider areas” with a simpler question such as “are trends different in areas with currently high populations”. The “visual evidence” provided by Figures 7 and 9 in the paper suffer from this problem. It shows cities of a certain size as they currently exist, but doesn’t show how the urban part of the landscape has changed, which is, presumably, what is important for understanding changes in DTR.

This is, by no means, an exhaustive list. It’s just the major things that I saw in a first quick look at the paper. Perusing it again, I see more. On a minor point, the paper has a higher-than-normal number of typos. This bothers me out of all proportion to its importance.

I shall finish, though, with the Appendix:

Colour-blind readers could have some difficulty in correctly interpret Figs. 567 and 9 because the multicolor images also include red and green colors. Using specific software, I simulate how these figures would appear to various types of color-blindeness and it seems to me that the different marked regions could be still recognizable although they would appear in different colors. However, specific software tools are freely available to transform image colors into those discriminable for various people with color vision deficiency: e.g. Visolve,

There are many things one could pick out from this awful paragraph – the almost-consistent spelling of “colo[u]r”; the mangled grammar of the first line; the phrase “Using specific software” but not specifying what it is; the offering of a solution (again with “specific software”), but not actually implementing it and, instead, writing this unapologetically ridiculous paragraph – but they are mere cherries and sprinkles on a giant arrogance cake – why should I fix it once, the author says, when every reader who has a problem with it can just download and install a piece of software and run it on the figures***? Normally, journals catch instances of creative grammar and every journal has a style guide that specifies which of color or colour is correct, so more demerits to Climate Dynamics for sloppiness.

* I’m not the world’s greatest fan of peer review. It seems – from all sides – a frustrating process. Authors do not heed my sound advice as a reviewer and as an author I find that reviewers often misunderstand and malign my beautiful words. From the inside, peer review seems to achieve its desired aim of improving manuscripts only some of the time. Sometimes a dreadful paper gets rejected, which might be counted a win (although they seem to get published elsewhere when that happens), but the rest of the time – the majority even – it just seems to slowdown publication of OK works and generally wastes a lot of everyone’s time. From the outside, the process is opaque and arcane and accusations of all sorts get flung about.

However, there is an interesting asymmetry between peer reviewed (PR) papers and non-peer-reviewed (NPR) papers which is that during the transition from NPR to PR, authors are more or less obliged to correct errors and respond to criticisms (or debate the point till everyone comes round to their way of thinking be it through superior argumentation or a greater tolerance for boredom), but once a paper is published the onus is on the “reviewer” to not only point out problems, but to do the work to show that the problems materially affect the analysis and then get that work into a state where it can be submitted, reviewed and published in turn. This is a much heavier burden and prohibitive.

For example, using CRU TS, as Scafetta did, seems a poor choice, but now the paper is published, someone objecting “officially” would be required to code up Scafetta’s analysis (or something sufficiently close that both parties could agree it was close enough) and then substitute a better choice of data set. This is rather a pain. If they don’t go to this trouble then all Scafetta need do is ignore the criticism, or say, “but you haven’t shown it is important”. This happens.

** The resulting global temperature series – found by assuming observed SSTs are correct and the land-ocean difference in models is correct – is then used to show that “corrected” observations warm less than models… and conclude that models are incorrect. It’s not even a circular argument, it just sort of spirals loopily outwards in ever increasing irrelevance.

*** I tried. The results aren’t great, but then having OK colour vision, I’m not the best judge of this.

Uncertainty myths page

I’ve added a “page” to the blog, a link to which should be hovering near the top of your screen. It goes through some uncertainty myths and dismantles them briefly. These are some bad arguments that crop up repeatedly.

The most annoying ones are variants on “climate scientists don’t know X” where X is something we deal with routinely and constitutes a large part of the literature. For example, X might be “that systematic errors are a thing”. Often such statements are accompanied by a link to an article written by a climate scientist discussing exactly that. I have had my 2011 papers (Kennedy et al. 2011a and Kennedy et al. 2011b) quoted at me to prove that climate scientists believe that all errors are independent and vanish upon averaging. Neither paper says this. Indeed, they both say the opposite – that the presence of pervasive systematic error means that there are fundamental, difficult to assess uncertainties in historical SST data.

Less annoying are myths that seem to arise when someone casts their mind back decades to when they were in university and fumblingly grasp at some half-remembered rule. This also seems to happen when someone has learnt to apply a rule and applied it for years without ever thinking about what it actually means. In both cases, the “rules” are presented rather as if carried down from the mountain and not, as it were, dredged up from the sump. The most common example is that the uncertainty in the output of a calculation cannot be less uncertain than its inputs. I’ve seen this argument made by otherwise perfectly reasonably people and seems to arise from a widespread over-application of a student lab heuristic.

And then there are the myths that might be variously interpreted as counsels of despair or sticking your fingers in your ears and wah-wahing “I can’t hear you”. The chief among these is simply to declare that the data are “unreliable” andthere’s simply nothing to be done with them. This is belied by the fact that the data inaccurately deemed to be unreliable consistently match data that are deemed to be reliable. After adjustment, ship data match SSTs from Argo floats and from satellite data processed independently from in situ data.

I aim to add to this list as time goes on, which is why it’s a page rather than a blog post. If you know of any other myths, let me know.

An initial look at Kadow et al.

A paper published in Nature Geoscience yesterday uses a neural network* to fill gaps in the HadCRUT4 global temperature data set. I’m always excited to see new approaches to reconstructing historical data and this paper uses a technique that is very different from those employed by other teams that have had a go at the problem. That alone, I think, makes it valuable – it is important to explore structurally-different approaches to the problem, the better to explore structural uncertainty. Anyway, go and read it. It’s short nicely written and very accessible. Then come back here and laugh at my terrible description. Continue reading

Structural uncertainty/error

I have been re-reading Povey and Grainger, “Known and unknown unknowns: uncertainty estimation in satellite remote sensing”, which is a wonderful paper in many ways – nicely written, comprehensive, scattered with good examples – and has as its subject matter a topic very close to my own interests. As a bonus it invokes the Rumsfeld matrix*.

I was thinking about one particular paragraph, which, uncharacteristically, skips lightly across a rather difficult topic:

“The combined impact of approximation, resolution, and system errors was defined as “structural uncertainty” by Thorne et al. (2005). Their emphasis was that the choices made by different investigators in the analysis of the same data can produce discrepancies. The terminology proposed above clarifies the type of choices which introduce such errors to an analysis and delineates by the manner in which they would
be assessed. Regardless, this paper would prefer “structural error” as it is the error that is structural, not its uncertainty. The term “structural uncertainty” is used by Draper (1995) to describe system errors, though with respect to statistical rather than physical models.”

This somewhat misses the point of structural uncertainty (or error), which might also be termed hubristic uncertainty. It’s the inevitable uncertainty that you missed, for whatever reason, even if you tried really really really hard to be thorough. Sometimes it’s because we just don’t know about key processes or sources of error. Add to that a few bits and pieces where it was just a bit too difficult to quantify the associated uncertainty, or you neglected a term on the grounds of its being negligible**, or you missed a key correlation in a source of error. That stuff snowballs and a little correlation goes a long way, particularly if that correlation goes a long way.

It also arises because, perhaps, you didn’t fully understand the problem, or perhaps no one does. This is one reason I disfavour “error” which is often defined as difference from the “truth” which assumes there is a singular “truth”. Structural uncertainty can (almost certainly does in many cases) incorporate doubt about the thing being measured, where it’s not entirely clear what the “truth” is or insufficient care has been taken to define it precisely.

This is a more common problem than you might expect. Take, for example, the notion of global mean temperature. It can seem simple – just average 2m air temperature across the surface of the Earth – but there are all sorts of odd cases, caves and overhangs say, or areas where there are two “surfaces”, such as a forest where there is ground and a solid canopy above it. How does one deal with snow particularly when that snow lands, and then compacts into an ice sheet or glacier. Then there are questions raised by the way that temperature is generally measured. WMO standards recommend a particular kind of environment that is intended to maximise representivity, but is not (ironically) necessarily representative of large areas of the world (forested areas, say, or extensive wetlands). People have also asked whether global temperature is defined based on air temperatures everywhere, or whether sea-surface temperatures are the correct thing to use over the oceans, as well as operational questions like what do we do with sea ice?

Answering these questions more precisely, or indeed finding out that such considerations matter is an important part of improving our understanding of the climate.


*though sadly, it avoids the difficult unknown known quadrant. Everyone knows about Uknown Unknowns – what we don’t know we don’t know – and Known Unknowns – what we know we don’t know – as well as Known Knowns – what we know we know – but there’s Unknown Knowns – what we don’t know we know. This fascinates me.

**amusing anecdote redacted.

National Climate Monitoring Products

A National Climate Monitoring Product is something like the “mean temperature anomaly for the UK” or “Highest daily rainfall total in Spain in 2019”. It’s a summary of the weather and climate for a particular country.

The WMO Expert Team on National Climate Monitoring Products has written guidance to define a set of six basic National Climate Monitoring Products (let’s call them NCMPs from now on) which are:

  1.  Mean temperature anomaly averaged across the country
  2.  Total rainfall anomaly averaged across the country
  3.  Standardised precipitation index averaged across the country
  4.  Warm days averaged across the country
  5. Cold nights averaged across the country
  6.  Counts of station records (the odd one out)

The idea is that all Met Services that measure temperature and rainfall – which, by survey, we know is most of them – can calculate these products and use them to understand how weather and climate are changing in their country. Some countries produce some or all of these already and the idea’s not to replace what’s already being done, but to give countries with fewer resources a core set of products to aim for.

NCMPs are useful for understanding the local climate, the relationship between climate variability and change, providing context for what’s happening, raising awareness of the importance of monitoring the climate, for public interest and, well, lots of other reasons. From a different angle, it helps to harmonise what countries are producing, which means that consistent information can then form the basis of regional and global assessments like the BAMS State of the Climate report.

There are numerous potential barriers to doing this. To overcome some of them, the team put some quite-easy-to-use software together that covers the process from quality control of observations, calculation of extremes indices, gridding of the data and calculation of the NCMPs.

You can find the National Climate Monitoring Product code and instruction manuals on GitHub. The code was written and, in part, adapted from ETCCDI software by Megan Hartwell and Lucie Vincent at Environment Canada and then worked on further by Simon Grainger at the Bureau of Meteorology. The different parts of the code produce a range of useful by-products: quality-controlled data, extremes indices, gridded maps of data and the NCMPs themselves.

The code has been run on data from a number of countries including Tanzania, Canada, UK, Morocco, USA and Australia and has been trialled in other countries.