I’ve been staring intently at this diagram (from Evidence for a limit to human lifespan) for a while now, particularly panel b. Panel b shows linear regressions of the logarithm of the number of survivors per 100,000 people at different ages from 70 up to 110. Each coloured line represents a linear regression for a particular age, with the colours indicating roughly what age that is.
I’ve been trying to make my own version of this diagram from data in the Human Mortality Database. I’m not sure if
(a) I have the correct file (fltper_1x1.txt) or
(b) whether I have the correct column in that file (lx) and then
(c) even assuming I do have the correct file and column, I’m not clear exactly how to do the linear regressions because for some of the more advanced ages, there are no survivors per 100,000 people early on in the 20th century and zero doesn’t log very well.
However, in mulling and intently staring, I noticed that something odd is happening in panel (b): the lines cross. One might interpret that as meaning that at some point before 1920, you were more likely to live out the year if you were 110 than if you were 105 which seems unlikely. Another way to think about it is that linear-regression might not be the best way to represent this data.
A simpler problem is to look only at the period from 1980, during which the data are non-zero in all age ranges given in the data set. With the caveat that I’m not sure if I’m looking in the right file, the pattern I see looks like this.
There are two lines for each age. One is wiggly and that’s the log of the data values that came out of the file. The other is straight and is the linear regression of that same logged data. The gradient of the lines increases with increasing age over this period, which, on the face of it would contradict the claim made in the paper that somehow the rate of change shows diminishing gains at the higher end of the age range.
Behind the face of it, there are some caveats that need to be considered. First, the data are heterogeneous. Since 2005, the input data on numbers of deaths lumps all deaths at ages of 105 and up together. Before 2005, deaths are recorded at each age all the way up to 124. There’s a change in the way the input data are presented at that point at that point.
Second, a series of calculations (making a range of necessary assumptions) are performed on the data to convert the reports of births, deaths and censuses into a consistent format and to derive the statistic I plotted – survivors per 100,000 at age x. What effect these assumptions and calculations have, particularly at the very upper end of the age range where individual deaths can make quite a difference, isn’t clear to me.
What this means for the analysis in the paper, I don’t know. It might, of course, mean nothing. This process of learning about how the data were gathered and processed and just what exactly they mean, is always an interesting aspect of exploring new datasets. It does however, confirm my initial concern about the heterogeneous nature of the data and it’s the kind of detail I’d like to have seen explored in the manuscript.
I spent a little more time looking at the data from the paper on “Evidence for a limit to human lifespan“. This figure is the result. The data are available upon registration from http://www.supercentenarians.org/
(a) shows the Maximum Reported Age at Death (MRAD) for four countries (France, UK, USA, Japan). The blue squares show the age of the oldest person who died each year as represented in the available data from those countries. The data base only has this information for people whose age at death was 110 or higher so I’ve indicated those years where there are no data with red lines. I’ve marked the earliest and latest entry for each country too. Only the UK data go back to 1968. The French and US data stop in 2003 and Japan in 2005.
(b)-(d) show the Maximum Reported Age at Death (MRAD) the 2nd highest reported Age at Death (2RAD) and the third in various colours. All the other data points from the four countries are shown in grey.
(e) shows the annual average age at death from the data already described. years with no data are shown in red.
(f)inally, the totalled number of entries per year in the database from the four countries (France, UK, USA, Japan) are shown as blue dots.
What if anything do these plots tell us? I’m not sure really. I was concerned that the missing data in (a) would affect the regressions (not that the regressions would be especially meaningful even if performed correctly). Ordinary Least Squares fits to the blue dots, ignoring the missing years yield the values shown in the original paper. I was informed by a proper statistician that there are standard techniques for working with missing data but it would appear that either they weren’t applied, or they give the same answer as OLS too a few d.p.
Second, the number of entries each year varies a lot and which of the four countries are contributing at the same time also does. This is likely to affect both (a) and (e). In particular, the first several and the very last (2006) data points are based only the UK data. This shift between a mix of countries with different age distributions might have had an effect on the shapes of the time series shown here. That’s particularly obvious for (e) which was Fig 2c in the original paper. The cubic spline smoothed series had a sort of double peak and the earlier peak was based on the small number of UK entries probably with no allowance for the missing data in that earlier period.
Third, I’m not sure quite why these four countries were chosen. Other countries – Spain and Italy were two I looked at – have longish datasets which weren’t used. They don’t add masses of data, but it’s not clear what the selection criteria was.
Anyway, the data seem to be more heterogeneous than I had initially thought and the problems with Figure 2 are somewhat deeper than I’d thought.
This is a slight departure, being a sort of review of (grumble about) a paper that appeared in Nature recently (Evidence for a limit to human lifespan) but is not at all related to weather or climate. It’s about how long people can live and the authors claim that their results “strongly suggest that the maximum lifespan of humans is fixed and subject to natural constraints”.
This essay has appeared, or reappeared; it looks like it was first published some years ago. It concludes, paradoxically:
“In our view, the fact that so many scientists agree so closely about the earth’s warming is, itself, evidence of a lack of evidence for global warming.”
The argument that leads to this conclusion is not clearly laid out and verges on complete incoherence. Nevertheless, some people seem to find it meaningful, so I thought I would look at it in more detail.
Occasionally, I see it stated that averaging of repeated measurements only reduces the uncertainty if they are repeated measurements of the same thing.
This is simply not true.
It is, however, a good place to start thinking about the general problem. If we take three measurements of (say) the temperature of a water bath M1, M2 and M3, the average is (M1+M2+M3)/3.
Globally speaking, November was a warm month, at least for a November. GISTEMP, NOAAGlobalTemp and HadCRUT4 all have November as the warmest November on record nominally. Part of the warmth, at least the part that distinguishes it form 2014, say, is likely due to the El Nino that matured mid-year. There are various estimates of the effect that El Nino has on monthly temperatures (directly because the El Nino region is rather large and part of the surface and indirectly because El Nino warms other areas with some lag) but are around a tenth of a degree or so. Continue reading
December 2015 was the last month of 2015. It had 31 days. December was also an unusually warm December, globally. The temperature anomaly in December was the highest recorded by some margin. In the NOAAGlobalTemp analysis the margin was 0.29degC. In HadCRUT4, it was around 0.31degC. Uncertainties in monthly values are typically larger than they are for annual values, but even so, the margin is significant in so far as the 95% uncertainty ranges for this December and the previous record holder don’t overlap. Continue reading