I spent a little more time looking at the data from the paper on “Evidence for a limit to human lifespan“. This figure is the result. The data are available upon registration from http://www.supercentenarians.org/
(a) shows the Maximum Reported Age at Death (MRAD) for four countries (France, UK, USA, Japan). The blue squares show the age of the oldest person who died each year as represented in the available data from those countries. The data base only has this information for people whose age at death was 110 or higher so I’ve indicated those years where there are no data with red lines. I’ve marked the earliest and latest entry for each country too. Only the UK data go back to 1968. The French and US data stop in 2003 and Japan in 2005.
(b)-(d) show the Maximum Reported Age at Death (MRAD) the 2nd highest reported Age at Death (2RAD) and the third in various colours. All the other data points from the four countries are shown in grey.
(e) shows the annual average age at death from the data already described. years with no data are shown in red.
(f)inally, the totalled number of entries per year in the database from the four countries (France, UK, USA, Japan) are shown as blue dots.
What if anything do these plots tell us? I’m not sure really. I was concerned that the missing data in (a) would affect the regressions (not that the regressions would be especially meaningful even if performed correctly). Ordinary Least Squares fits to the blue dots, ignoring the missing years yield the values shown in the original paper. I was informed by a proper statistician that there are standard techniques for working with missing data but it would appear that either they weren’t applied, or they give the same answer as OLS too a few d.p.
Second, the number of entries each year varies a lot and which of the four countries are contributing at the same time also does. This is likely to affect both (a) and (e). In particular, the first several and the very last (2006) data points are based only the UK data. This shift between a mix of countries with different age distributions might have had an effect on the shapes of the time series shown here. That’s particularly obvious for (e) which was Fig 2c in the original paper. The cubic spline smoothed series had a sort of double peak and the earlier peak was based on the small number of UK entries probably with no allowance for the missing data in that earlier period.
Third, I’m not sure quite why these four countries were chosen. Other countries – Spain and Italy were two I looked at – have longish datasets which weren’t used. They don’t add masses of data, but it’s not clear what the selection criteria was.
Anyway, the data seem to be more heterogeneous than I had initially thought and the problems with Figure 2 are somewhat deeper than I’d thought.