in 40 years, 13 out of every 10 …

In 40 years 13 out of every 10 US citizens will not have landlines.  The kind of reasoning that leads to this conclusion also supported the recent claim that in 40 years 10 out of 10 Americans will be obese.  As the Numbers Guy in the Wall Street Journal points out:

The phone forecast is impossible, of course, but it’s arguably no less solidly grounded than the obesity forecast. The weight projection uses three data points spread out over nearly three decades to estimate a linear trend — then brazenly draws that line into the future.

Human beings have real deficits in reasoning about probability, and that can include people giving medical advice, but you don’t really expect it to show up in a scientific journal, even an online journal like Obesity (link corrected thanks to Noumena  in comments), where it was published.  Certainly, at least not one backed by the highly respected Nature Publishing Group.

The recent study was intended by lead author Youfa Wang “to send a message” to public-health officials, he says. Dr. Wang, associate professor of international health and epidemiology at Johns Hopkins University, adds that there is no conflict between this goal and the standards of scientific inquiry. He notes the scientific pedigree of his co-authors, who include Hopkins colleagues, and researchers at the University of Pennsylvania and at the federal Agency for Healthcare Research and Quality.

And the response when confronted with the problem?

“This study isn’t designed to predict what the future actual situation will be,” Dr. Wang says. “We just say, if you take these assumptions, this is what the future may be.”

Richard Bergman and David Allison, editor and associate editor, respectively, of Obesity, wrote in an email: “Each scientific paper is hoped to be an ever closer approximation to the best possible answer to a question than was the work that had gone before. We believe Dr. Wang’s paper fulfills that spirit.”

On the fact of it, I’d say the paper deserves a C or a D and a good scolding that centers around not presenting propaganda as though it were truth.  And the editors?  Well, I just hope they are not the sort that grumble at cocktail parties about how post-modernism is the curse of academia.

Is that too harsh?  What do you think?

(And thanks to Tara Parker-Pope’s article for the link to the Numbers Guy.)

10 thoughts on “in 40 years, 13 out of every 10 …

  1. It took me forever to track down this article. Confusingly, Nature appears to have two distinct publications on obesity: the International Journal of Obesity, and Obesity (formerly Obesity Research). You linked to the former, `The Numbers Guy’ didn’t provide any references, and the article is actually published in the latter. Anyways, the abstract is here. Unfortunately, neither the abstract nor the Johns Hopkins press release describe the methodology in any detail — in particular, what kind of regression was used to make the projections. For some reason, my university doesn’t have a subscription to Nature’s preprints, so I can’t read the full article until my interlibrary loan version arrives.

    Frankly, I’m not sure that I trust `The Numbers Guy’. The study he chastises for using linear regressions to arrive at the conclusion that `more than 100% of American adults will not have landlines’ in 2048 (he has a link on the blog post accompanying the column) constructs no regressions of any kind, and makes no quantitative predictions.

  2. Noumena, my bad about the link. I’m impressed by your diligence here. I usually do background checking and I didn’t on this one principally because the conclusion is false.

    The comments of the author were also not reassuring. Still, you’ve raised a question about my grading. Let’s see if anyone else can be helpful.

    in case my assertion about the conclusion seems cavalier, just think of the factors that will keep some people from becoming obese, including genetics, illness, career pressures, etc.

  3. Right, to be clear, I’m not saying the study is all that great. My point is that the reporting is sketchy, so until I get the actual article, I don’t feel I can really assess it one way or the other.

  4. Noumena, here’s a related study by the same person:
    http://epirev.oxfordjournals.org/cgi/content/abstract/mxm007v1
    That page out to have a link to the full text.

    In looking at Wang’s work on the web, it’s clear he’s doing really valuable work. I’m beginning to wonder if there’s a cultural factor that was involved in publishing a study with that obvious falsehood in it. Perhaps it’s a reductio. Hmmmmmm. Or not.

  5. Thanks, jj!

    Okay, this paper does use a linear regression to make projections into the future. And that is a problem, for exactly the reasons `The Numbers Guy’ gave. For anyone who hasn’t taken statistics recently/at all, here’s an example: their model predicts that the percentage of Americans 20 years old and older who are `overweight’ (meaning they have a BMI at least 25) is increasing by 0.772 percentage points every year. Based on this, they predict that 74.68% of Americans at least 20 years old will be `overweight’ in 2015. But a linear model assumes that the rate of increase is constant — even beyond 2015, the percentage of `overweight’ American adults will increase by 0.772 percentage points every year. This means that, in 2048 (2015+33), 100.156% (74.68 + .772*33) of American adults will be `overweight’. In 2058, that number goes up to nearly 108%.

    The response might be that the model isn’t supposed to be used for projections 40+ years into the future. But it’s based on data that go back to 1971 — it looks 36 years into the past — and it’s a very, very good model of that historical data, at least when it doesn’t control for gender (R^2 >= 0.97 is very accurate). Fairly accurate (ceteris paribus!) predictions 40 years out aren’t too much to demand. And that leads you to the absurdity of more overweight adult Americans than adult Americans.

  6. Noumena, I’m not sure I agree with your evaluation of the statistics. An R^2 of 0.97, while very high in some contexts, isn’t actually that unusual for a time series regression, since both the independent and the dependent variables typically follow time trends. Even unrelated random walks tend to have a high correlation over time. I think it may be unrealistic to expect the model to predict further into the future than the length of time covered by the dataset.

    On the other hand, perhaps the author should have used a model that is incapable of generating a nonsense result of that sort (probit, for example, which always generates values between 0 and 1). However, such a model isn’t necessarily any more accurate for predicting short term trends, such as the 2015 calculation, and it can be more difficult to interpret. If the author had no interest in drawing conclusions more than 8 years into the future, then the linear model used may not be unreasonable.

  7. Fractal, perhaps I’m not understanding your point, but the independent variable here is time. The regression is the time trend, whether actual or spurious.

    Also, it’s ambiguous when you say that `I think it may be unrealistic to expect the model to predict further into the future than the length of time covered by the dataset.’ Do you actually mean the length of time — because, as I said above, the length of time covered by the dataset is just a few years less than the predictions that take us above 100% — or the time period, ie, the years the dataset is drawn from?

  8. My point is that even unrelated variables typically trend over time, and will therefore be correlated (positively or negatively, it doesn’t matter which). This correlation results in “explanatory power” and a typically high R^2 for time series regressions, even though it doesn’t necessarily mean anything (other than that the variables are correlated).

    Cross-sectional datasets, on the other hand, rarely have so ubiquitous yet unhelpful a connection between variables as “time” is in a time-series. This leads to much lower R^2 for regressions on cross-sectional data, as well as greater apparent meaning for the R^2 value in those cases (particularly if the number of included variables is controlled for, with an adjusted-R^2 statistic).

    Hmm, do you mean that the original paper included no explanatory or control variables other than time, and simply presented a linear regression of obesity on time? If so, then the result (regression coefficient) given is an interesting but well-known fact, and should not be more than the most preliminary result presented in any respectable paper.

    In response to your question, I mean that if data stretches from time 0 to time X, using it to extrapolate out past time 2X is a lot to ask. The closer you stay to your data, the more accurate a first-order (or second-order, or third-order) approximation is likely to be. (An approximation is all we are going for, as I doubt anyone, including the author, believes that obesity truly follows a simple linear trend over time, because of precisely the sort of absurd results showcased in the post at the top of the page.)

  9. In case it’s getting lost in technicalities, my overall point is that linear regressions often have value even if the author does not believe that “reality” is linear. Linear regression can provide a useful approximation, so long as interpretation is conservative and limited to the questions the model was originally intended to answer.

Comments are closed.