Are you human? Take our easy test today

Sometimes, science is inaccessible; it requires deep and specialised knowledge possessed by a few and then only after years of study. Sometimes, science is hard; it requires mastery of facts and theories each incomplete or uncertain and only a part of an unguessable whole, an effort equal parts intellect and imagination. Sometimes, science is a slog; the patient accumulation of data, the careful compilation of statistics, the fight to publish papers no one reads, the endless endless meetings.

But sometimes, science is easy.

You think I’m kidding? See if you can spot the error…

According to google, the average human bean1 is five feet and five inches tall (165 cm). I am five feet and nine inches tall. Therefore, I am not human.

Easy, right2? By spotting the error in the previous paragraph, you have already outsmarted a professor of physics and it’s not even lunchtime3. Let’s try a slightly more complex case:

The average outcome from rolling a six sided dice is 3.5. I have rolled a six-sided dice ten times4 and got the following sequence – 1, 6, 4 ,3 5, 6, 4, 2, 6, 2. As none of these is equal to 3.5, I conclude that the sequence was not generated by a six-sided dice.

It’s the exact same mistake, just repeated. We don’t expect every individual or every dice roll to exactly match the average and that’s because the average isn’t something an individual has. An average is something a group of things has whether that’s all the people on Earth, or the rolls of six-sided dice.

If you break the world’s population up by height and count how many people are five foot tall, how many are five foot one inch tall, five two inches… and so on from the very shortest to the very tallest person, and then graph them up, you will get something like the (in)famous bell curve. The average tells you roughly where the middle of that curve is. There are more people close to the average and fewer people who are much shorter or taller than the average. My height – five foot nine – is above the average but well within the range of human heights. So, we can’t rule out my humanity based solely on this measure.

Another property of the curve is a measure of how wide it is. There are various ways of expressing this. One could take the difference between the heights of the tallest and shortest people alive today, or choose a range that has in it 99% of all people, or something like that, but statisticians often use something called the standard deviation. The standard deviation is a measure of how wide a distribution is and, like the mean, is a property of a group of things. But back to this later.

For a dice, we know that each of the numbers one through six is equally likely. In this case, rolling a dice over and over won’t get you a bell curve, instead you will end up with a flat distribution with equal counts for each of the numbers. At least, you will if you do it enough times. “Enough times” is one of those things that scientists and mathematicians say and might, if you are minded to do the experiment, take you quite a while. Anyway. My sequence of six-sided dice rolls, and any other will be within that range – one to six – and so, again, we can’t conclude that it’s not a six-sided dice.

When we’re comparing things, we need to be clear what we are comparing and what question exactly we are asking. In the first case, the question is “am I human?” and, as we approach the question armed only with the knowledge that I am five feet and nine inches tall, this comes down to working out whether my height falls within the range of human heights.

Of course, there are complications. Obviously, google haven’t measured everyone’s height, so the number they give is an approximation, an estimate based (one hopes) on some data. And, while Google will tell me the average height of a human and, if asked correctly, will tell me the heights of the tallest and shortest adults every recorded, it doesn’t have the standard deviation of human heights at its finger tips.

For lots of tasks, we don’t have google’s ready made answers, but we do have some data. Working out what we can do with some data is where statistics comes in.

Once, in a science lesson at school, the teacher got us all to measure our heights. I wasn’t quite the shortest person in the class – If I stood on tippy-toes, I was marginally taller than actual shortest person (also standing on tippy-toes) – but I was down there at the bottom end. The tallest person was around six feet and several inches tall, nearly a foot taller than I was5 (though various people noted the thickness of the soles of his boots).

The teacher amassed all the figures on the chalkboard and we calculated the average and standard deviation. I forget what they were, but they were almost certainly different from the google number or the answer you would get if you somehow measured everyone in the world. For a start, we were all British teenagers and so we didn’t look (even from the limited perspective of a tape measure) like the global population does. Everyone was the same age to within a year or so. Second, even if we had, by some miracle, been a representative sample of the global population, there were only about thirty of us. Thirty people is nowhere near enough to get an accurate picture of global heights.

If you think that you have a representative population, then there are some things you can calculate that will tell you how far the average you just worked out is likely to be from the true global average6. This number is called the standard error of the mean. It’s usually calculated by dividing the standard deviation by the square root of the number of people in the group (or data points). If the standard deviation is large and the number of data points is small, the standard error of the mean can be quite large, but if you have lots and lots of data points, the standard error of the mean can be teeny-tiny. The more data you have, the smaller it becomes. The two – standard deviation and standard error on the mean – ought not to be confused. They’re very different things.

At this point, you are probably wondering why the hell I’m telling you all this patronisingly simple nonsense. Your cornflakes are getting soggy and your coffee is cold and I still haven’t got to the damn point.

OK. Back to the professor you outwitted earlier on.

He’s called Nicola Scafetta. He wrote a paper in 2022 in which7 he compared my height to the average height of humans and concluded I wasn’t human. No wait, that was my example, and not even the more pertinent one. What he actually did was more akin to the dice problem: he compared global temperatures in the real world (a sequence of dice rolls/a single thing) to the average of what happened in a climate model world (3.5 over and over again/a property of a group of things). He was comparing the sequence of actual temperatures to a sequence of averages and concluding that one could not be the other. This is, as you know, an elementary mistake.

Gavin Schmidt, Gareth Jones and myself wrote a comment on the paper pointing this out (we used more technical language, but I think I’ve transmitted the gist) and Scafetta responded. The comment and response are now out in Geophysical Review Letters (GRL) 17 months after the original paper was published. It took some time because, until this point, GRL didn’t have a comment and response policy, but that’s another saga.

Actually, the situation is more complex than my example above, but only very slightly. We’re not comparing a single being to a single large group of humans. There are several different climate models, which Scafetta grouped together according to an estimate of how much the world warmed in each one under a set of standard conditions. The situation is more like trying to work out what age group I’m in by comparing my height to the average of different age groupings and then confusing the standard deviation of each group with the standard error of the mean. It’s not the first time this mistake has been made and corrected.

It’s important to point out here that CMIP6 – the collection of models under study – has been shown to contain models that warm unrealistically. The upper end of what his group of models predicts for any scenario is probably unrealistic. This has been shown elsewhere. Scafetta’s analysis, while situated notionally within that larger debate, does not add anything useful to it and independently of that, its flaws should disqualify it from serious consideration.

The argument across Scafetta’s original paper, our comment and his reply is a mess (for which of course, I should claim some responsibility) and his response doesn’t make anything clearer. He adds a layer of nomenclature which wasn’t there in the original paper (macro-GCMs?!), shifts the terms of the argument and bungs in a bunch of text book material. The whole thing smacks of Pat-Frankish hagfishing (see also the molehills of madness and the final line few lines of this horror) and is considerably longer than our comment. Since publishing his original paper, Scafetta has also squeezed out two more which respond to our criticisms without having to respond directly to them and he cites both here to add extra layers of complication.

In one of the two new papers, he almost gets the point and compares the real world to the models using a measure of the full spread rather than the narrower standard error of the mean, but, at the same time, he changes the nature of the test. If you have a look at the data, you’ll see why: there’s a gap in the distribution at a crucial point. Given that this later paper effectively concedes one of our points, it’s strange that he argues so vociferously against that element of our comment (he argues vociferously and verbosely against them all). He cites both of his new papers in his reply as if they somehow contradict our argument. They don’t, although a generous person might allow that they improve his argument in so far as it moves slightly, grudgingly in the direction we proposed. His estimate of observational uncertainty diagnosed from the spread of observational products seems a good way to go, but doesn’t get entrained in his analysis. His estimate of the uncertainty based on the propagation of errors formula is a clear underestimate and is, of course, the one he prefers.

Some of the stuff in his reply is a trifle wild. The annotation of Figure 1, labelled “Logical Inconsistency”, is just comical and any paper which contains the sentence “Only the quantum world8 presents instances of true randomness” has underestimated itself. Aside from the funny funny stuff, he also asserts (or at least seems to) that climate models ought to be able to get every El Nino, La Nina and climate-mode shift in the right places every time9. No one expects that.

If we take this hypothesis seriously (for a moment) then maybe Scafetta’s paper could be interpreted as ruling out the joint hypothesis that {climate sensitivity is high} AND {that climate models are able to accurately nail the timing of every up and down in the observed record}. While there is evidence that some large scale shifts in what were previously considered purely “natural” modes of variability may have been wholly or partly forced, it’s a long way from asserting that they all are. ENSO variability can’t be reliably predicted a year ahead, let alone divined from forcings. Given that we can already reject that part of the joint hypothesis, logically we can’t make strong claims about the other part based on Scafetta’s analysis. It is uninformative.

But, as I said before, strong claims about high sensitivity models in the CMIP6 collection have been made. That’s not the problem. The problem is Scafetta’s crummy analysis being crummy, remaining crummy, and not supporting the significantly stronger claims that he wishes to make.

For a more intelligent and in-depth analysis of this (and almost everything else), see the post at RealClimate.

-fin-

Update: Thanks to those who took the time to point out my typos.

-finfin-

  1. A typo, as pointed out by numerous people, but… BFG ↩︎
  2. I am human, by the way and not, for example, an LLM tasked with writing a breezy blog post in a style that irks you in a way you can’t quite put your finger on. I am a human writing a breezy blog post in a style that irks you in a way you can’t quite put your finger on. ↩︎
  3. For the purposes of this discussion, you’ll have to imagine you are reading this over breakfast because I am writing it over breakfast. I might have written it in darkest night because I had a belly ache from eating a chocolate mousse whose expiry date had come and gone and couldn’t sleep. But I didn’t. This is probably fortunate. The things I write at 3am are terrifying and much harder to explain and you would have to imagine reading them at 3am which, in the bright light of day is never really possible. ↩︎
  4. In case you are wondering the sequence was generated by rolling five six-sided dice twice and reading them from left to right. I have a lot of dice on my desk (see the header image). Not all of them have six sides. Yes, I’m one of those, or I was thirty years ago. By the way, if you ever need to simulate dice throws from a twenty-sided dice and don’t have one to hand just google “d20”. If you want to throw three twelve sided dice, then “3d12” will do it. ↩︎
  5. The tallest person in the school was already well over six feet tall at the age of thirteen. He was the school’s shot-putter on the grounds that, simply by extending his arm and dropping the shot putt, he could make it land further away from him than almost anyone else in his age range could throw it. In addition, of course, the shot-putt started much further from the ground so any horizontal momentum he imparted to it counted that bit extra. ↩︎
  6. If you say this sentence out loud in front of a mirror, a statistician will magically appear to correct you. ↩︎
  7. The paper is called “Advanced Testing of Low, Medium, and High ECS CMIP6 GCM Simulations Versus ERA5*-T2m“. I contend that the use of the word “advanced” is funny quite aside from the paper’s contents. Every time I read it, I hear a voice** saying “Oh, so it’s the advanced testing you’re using now, is it, professor?”. What Scafetta considers “advanced” rather spoils the joke. ↩︎
  8. If I thought this sentence meant anything at all, I would be interested to know what other kind of world he thinks there is. But no, it’s all muddled. His arguments about what is and isn’t “random” and his contrasting of that with what is “physical” belie an underlying confusion. He’s willing to treat observational uncertainty as random, for example, but every observational error has its origin in the physical world. The two aren’t somehow exclusive; their relationship is conditional upon our state of knowledge. Anyway. The paper is rife with that kind of rabbit hole. It’s mostly rabbit hole as far as I can see, which is, structurally speaking, a difficult trick to pull off.. ↩︎
  9. He actually says: “In fact, the GCMs should hindcast the primary dynamical patterns produced by the climate system, including those generated by the global atmospheric and oceanic circulation.” While my reading of this might be too strong, he’s really not especially clear on this point and I’m only making that reading so that other parts of his analysis makes sense. If he does mean that a model’s ensemble mean should match the observational record, and his analysis follows from there, then it can be dismissed as irrelevant simply on those grounds (but I’m repeating myself, or I’m about to repeat myself***). ↩︎

* In the original paper, ERA5 is still cited as Huang et al. 2017, which is the ERSSTv5 paper.

** An Irish voice, more precisely.

*** My footnotes occasionally take on a life of their own, which is non-sequential, circular, possibly fractal, and a fatally unbalanced mix of self-referential and centrifugal. The centre cannot hold, deaf falcons etc etc.



6 responses to “Are you human? Take our easy test today”

  1. I can tell you why GRL didn’t have a comment and response policy in place, if you want to hear it. They used to have one but got rid of it a decade ago when a certain individual raised a ruckus about a bad paper.

    1. Ahhhh. I can perhaps guess.

  2. > It’s the exact same mistake, just repeated.

    In that it’s another wrong term for the entity described?

    (You’re rolling one die, singular; “dice” is always and only plural.)

    1. Thank you for this correction.

  3. […] In addition to these interesting and/or thought-provoking papers, there are the kinds of papers and blog posts that provoke a whole grab bag of negative responses: sarcasm, withering scorn, resignation. It’s always fun to write this kind of thing and these posts have proved quite popular, but at the same time my feelings about them are ambivalent. This started with a series of posts on a bad uncertainty analysis, nonsense about rounding errors, even more on rounding errors, bad logic and komodo dragons, why I don’t generally engage directly with sceptics (and again), embarrassingly basic science errors, the unique argumentation style of Pat Frank, a paper the Royal Society should be ashamed of, and finally the whole saga of a rebuttal of a paper in GRL. […]

Leave a comment