Sticking em where the sun don’t shine

Thermometers, that is.

OK, that’s not much better…

You stick them in a Stevenson screen, or something like it. The clever design means that the air can circulate, but thanks to the louvered sides, the sun is kept off the thermometer itself. If the sun were to shine directly onto the thermometer bulb, it would heat up warmer than the air its supposed to be measuring, but that rarely happens1.

I like Stevenson screens. They’re in that satisfying niche of low tech and high2 sophistication. In one sense, it’s just a thermometer in a box, but the box is clever in all sorts of ways as well as being relatively cheap and not excessively onerous to maintain.

They’re not perfect, of course, and better measurements can be made, but their virtue is their simplicity and, via their simplicity, their consistency. A thermometer in a Stevenson screen (or at least one like it) is how you can measure temperature anywhere in the world and that’s fine as long as what you want to know is the temperature in the box.

Whether we do want to know that or not is a tricky question. All else being equal, a thermometer in a Stevenson screen in an open field with short grass, will record a different temperature to a Stevenson screen in a rocky clearing, or a parking lot, or on top of a building, or under a tree. Various guidelines stipulate that the screen should be placed in a site with an open aspect, short grass (or vegetation representative of the surrounding area), no nearby trees or buildings, out of any shade, low gradient, far away from water (unless water is significant for the region), not in a climatologically weird area like a frost hollow etc. There’s a long list.

The reason for these rules is that there can only be so many weather stations, so its best to put them where you get the most out of the them. Temperatures measured at a station sited in the above-mentioned perfect locations are – theoretically – representative of temperatures over a wide area. Obviously, they’re not perfectly representative because temperatures do vary from place to place, but those variations are more closely related to the large scale features of the synoptic situation which makes it great for, say, weather forecasting.

A system has been devised that categorises weather station siting, qualitatively assessing how far from the ideal it is for measuring each meteorological element (temperature, precipitation, wind). Stations that achieve Class 1 for any element3 are the best. Elements at Class 5 are the worst. For the site overall, the worst rating of all elements is used if such a thing is needed. For temperature, a quantitative scheme has also been added to this, giving representative uncertainties for different siting classes. Class 1 and 2 have negligible uncertainties in this scheme (less than 1°C). Classes 3, 4 and 5 have uncertainties of 1°C, 2°C and 5°C respectively. These values aren’t clearly justified in the document.

That’s all grand, but if we think that a parking lot isn’t representative of a grassy open area, then a grassy open area isn’t representative of a parking lot (or a mossy clearing, or the banks of a mighty river etc) and, for better or worse, parking lots, mossy clearings and mighty rivers are things in the world. If we’re measuring the temperature of the surface of the whole earth then we need to know the temperatures in all those parking lots, as well as over every other surface there is. As the classification system notes “Complex terrain or urban areas generally lead to high class numbers“.

Related to this question is the question of what those nominal “uncertainties” are relative to4. One obvious answer5 is “relative to what a screen would measure in an ideal siting at the same location”. This is not actually something that can be realised except where the station is already sited in such a place. One must be slightly wary of a measurand that doesn’t exist and, for obvious reasons, can’t.

This is a somewhat different question from whether the recorded temperature matches the “true” air temperature. Screens are good, but they still don’t measure the “true” air temperature at that location whatever that is. For all that air can circulate through the box, it doesn’t always. And, while the box is white to reflect solar radiation, under strong insolation the sun can warm the box and the box can warm the air inside it6. Both these can lead to differences in the measured air temperature relative to an aspirated thermometer7.

Then again, one can place multiple sensors in close proximity to one another and actually measure how “representative” each one is of the others. The Climate Reference Network stations do this as a matter of course, placing three sensors in very close proximity to one another8 and they tend to agree very closely. Experiments have also been carried out, with sensors at varying degrees of separation from tens of metres, to kilometres. Even on the tens of metres scale there can be differences of a few degrees. Even on the ones of metres scale there can be significant variations in air temperature even when station citing doesn’t change significantly.

All of these are different from another question which might be more pertinent for climate monitoring: how different are changes in the recorded temperature from what the (perhaps mythical) well-sited, perfectly-instrumented station in the same location would measure? For climate monitoring and estimating long-term change, it’s often more important to know how temperature changed at a location rather than what the specific temperatures are at any time. Actual temperatures can vary rapidly over metres and kilometres, but anomalies – that is the difference from a long-term average for the station – tend to carry over wider areas. The uncertainties associated with different classes of station would be different for anomalies than for actual temperatures (assuming we can agree on a measurand).

These considerations are all very well, but why should we care? Well, the answer’s not easy. It really depends what you want the data for.

Questions of station siting and instrumentation inevitably arise whenever there’s a new temperature record, be it a national record, or an international one. Typical a local met service will make sure that the candidate record temperature was made at a station with reasonably good exposure, with properly maintained and calibrated equipment, and without any interference. They also tend to cross check with nearby stations and weather forecasts to ascertain that the value is indeed reasonable9. For international records, the WMO has a special rapporteur who will generally convene a team of relevant experts to investigate the exact circumstances of the measurement. Now, while this is a sensible approach, it does mean that for any particular record there will be places and perhaps people who experienced more extreme temperatures or else milder ones because they didn’t have the good (or bad) luck10 to find themselves in a grassy field.

The flipside of this is in urban monitoring. Urban areas are very complex. The linked presentation shows some examples of urban monitoring projects, but these only begin to map out the complex nature of temperature variations in urban environments and other environments typically neglected by traditional meteorological networks. In urban environments, it’s not entirely clear what representative might mean.

The other time when station siting is frequently mentioned is in studies of long-term temperature change. These mentions often cite the classification scheme and claim that many stations do not qualify as Class 1 or 2 and their measurements are therefore suspect and, consequently, estimates of global temperature change are also suspect. Some argue, more specifically, that urbanisation affects too many stations to get a reliable estimate of global temperature change, one that is too high. This thought has occurred to climate scientists of course and there’s an extensive literature on the subject.

When considering long-term change and the suitability of individual stations, the situation is not generally one of perfect knowledge. We don’t know (unfortunately) the precise conditions at each and every station at all times from their inception to their closure. We do know that stations are prone to being moved. Formerly rural stations sites can get swallowed up by expanding urban areas and need to be relocated. Stations in small cities can, over time, become stations in mega cities. Not all changes are so extreme though. The process of identifying and removing artificial changes due to station siting (and other causes) is known as homogenisation. It makes use of the fact that, while actual temperatures can vary greatly over short distances, changes in temperature and temperature anomalies vary considerably less, particularly at the kinds of time scales that climate scientists are interested in11. If a station move is not documented, it can still be identified by comparison of its temperature series with those of its neighbours assuming they didn’t all move at the same time12.

The greatest proponent of the siting issue is Anthony Watts. Years ago, he started a project to document temperature stations in the US. It seemed like a good idea to me. It can’t hurt to know what the station siting is like. I could have lived without the annotations on each photo and the two-footed leaps to grand conclusions, but hey! I wasn’t getting off my butt to go photograph weather stations so who was I to criticise. After the longest time sniping at NOAA, and gleefully posting pictures of poorly sited stations, he finally got round to (someone else) writing up the results of the analysis and found that for mean temperatures… it made little difference if you picked only the best stations, or threw in everything. Maximum and minimum temperatures were affected individually but in equal and opposite ways. Rather than meekly accept the results of his own study, Anthony vowed to revisit the issue in another, better paper. We are still waiting.

That’s not to say that there aren’t issues. There are. Systematic studies of homogenisation algorithms suggest that while they move things in the right direction, they don’t always move them far enough. There are also other approaches such as that taken by Berkeley Earth.

But I digress. What is strange, if you think about it for a while (that’s enough!), is that a station which doesn’t move and has no change in instrumentation might be considered to be in error just because a tree grew nearby or someone built a barn (or a megacity). It’s still measuring air temperature at that location and that location is still part of the Earth’s surface. Temperatures at that location might really be higher or lower because of that change. Why then, do climatologists go to such lengths to remove these actual changes in temperature? Again (you can think again now, sorry) the answer comes down to representativity. A tree growing at this particular location (or what have you) means that the trend in temperature is not representative of the wider area except in the philosophically untenable situation that a tree grew everywhere13. In fact, if trees grew over a wide area, then there would be no weather stations that were both representative of that _and_ impeccably sited. Similarly, in areas that are widely and rapidly urbanising. Urban heating is a thing because urban areas are warmer, but it’s only an issue for global temperature estimates if urbanising areas are over-represented in global temperature datasets and homogenisation doesn’t work. Urban areas account for around 3% of the global surface area, so it’s probably the case that they’re over-represented14. More broadly, the way that land is used (or left unused) can affect local and regional climate.

Ironically, the notions that underlie our measurements of long-term change aren’t ready to deal with long-term change because it happens to include lots of things that scientists have been at pains to leave out.

Anyway, it’s not a simple subject – when is it ever – and when using station data, it is necessary to think carefully about how these (and other) issues relate to the particular problem you are trying to solve. But anyone doing serious research on the matter knows that already.

-fin-

  1. If there’s snow on the ground, sunlight can bounce off it and up through the slats, scatter, and illuminate the thermometer bulb, giving spurious readings. Anecdotally, I’ve heard of errors of 5°C. ↩︎
  2. high-ish. Let’s not get carried away. ↩︎
  3. The system decries the imperfect state of the world, which implies that a perfect world would be flat (or at most, gently undulating) and uniformly covered in well tended lawn. ↩︎
  4. I’m interpreting “uncertainty” here in a sense loosely associated with the Guide to the Expression of Uncertainty in Measurement, which is something like the range in which the true value of the measurand is likely to be found. ↩︎
  5. Another is that the temperature is representative of the atmospheric temperature free of the boundary layer. But again, that’s not something we regularly measure. And, again, it’s application dependent. A datasets designed for measuring global surface temperature change might legitimately take a different approach to one used to diagnose regional population exposure. The more questions climate science is called on to answer (or pipes up to answer whether called on or not) the more a one-size-fits-all approach to datasets seems unhelpful. ↩︎
  6. Not to mention the aforementioned snow glow which presumably occurs in other situations where the surface has a high albedo. ↩︎
  7. One can ask whether an aspirated thermometer measures the “true” temperature too. The simple answer is no. No measurement is perfect; there is always uncertainty. ↩︎
  8. This is done so that problems with one sensor can be quickly identified and resolved. If one sensor behaves differently to the other two – more noise, a drift etc – then it’s likely there’s a problem with that sensor. Redundancy in measurement is key to catching errors and understanding uncertainty. ↩︎
  9. A recent measurement of nearly 53°C was reported in India in the past week. However, the met service swiftly concluded that the measurement was erroneous. No other measurements in the vicinity were close. ↩︎
  10. While we’re thinking about these things, wildfires affected millions of hectares in Canada in 2023 (just as an e.g.). What’s the air temperature during a forest fire? Part of my brain says we should include such things in a global average, another part says that kind of thing ought to be filtered out if it ever gets recorded. ↩︎
  11. Traditionally, decades and more, but awkwardly these days, it means all time scales: the stats of hourly or sub-hourly temperatures (and other variables of course) can vary on climatic timescales. ↩︎
  12. Hypothetically a country might consider do something like this by switching instrumentation overnight across a whole national observing network, but slower, piecemeal change is far more common and evidence from pseudo-data tests suggests that homogenisation algorithms can deal with this. ↩︎
  13. “Trees grew everywhere” sounds better but is wrong. The issue is that you can’t duplicate “a single tree grew here” everywhere because that’s not the same as trees growing everywhere because then lots of trees would be growing at any particular location and that’s not what we want (or maybe it is). ↩︎
  14. Added 2024.06.20: Urbanization exacerbates continental- to regional-scale warming suggests that we need to take the issue more seriously when dealing with long-term change. Old assumptions like SST being a good proxy for marine air temperature need to be revisited from time to time to ensure that they still hold. ↩︎



Leave a comment