Eat chocolate -> get wicked smaht?

From Messerli, Frank. Chocolate Consumption, Cognitive Function, and Nobel Laureates, New England Journal of Medicine 2012, 367:1562-1564, October 18, 2012

This post first appeared on Bittersweet Notes.

I managed to score last week’s issue of absurdist scientific humor publication The New England Journal of Medicine, which includes a hilarious note on “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” As I continued reading the issue and failed to see the humor in such knee-slappers as “Fibulin-3 as a Blood and Effusion Biomarker for Pleural Mesothelioma” and “Evaluation and Initial Treatment of Supraventricular Tachycardia,” I quickly came to the realization that NEJM is not intended as a satirical magazine. It is, in fact, among the world’s most prestigious peer-reviewed medical journals.

Inspired by recent findings that compounds in chocolate improve cognitive function, cardiologist Franz Messerli’s note questions whether there is “a correlation between between a country’s level of chocolate consumption and its population’s cognitive function.” Using the number of Nobel laureates per capita as a “surrogate end point” for a population’s percentage of wicked smahties, the study finds a “surprisingly powerful correlation between chocolate intake and the number of Nobel laureates in various countries” (23 in all). While he concedes that correlation does not imply causation, Messerli writes “since chocolate consumption has been documented to improve cognitive function, it seems most likely that in a dose-dependent way, chocolate intake provides the abundant fertile ground needed for the sprouting of Nobel laureates.”


Reportedly, when contacted by the Associated Press, “Sven Lidin, the chairman of the Nobel chemistry prize committee, had not seen the study but was giggling so much when told of it that he could barely comment.”

Indeed, one doesn’t require a doctorate in statistics to find serious flaws in the study. It was clearly intended as tongue-in-cheek to some degree by Messerli (who has according to NPR published around 800 peer reviewed papers) and NEJM (which also according to NPR has a history of occasional tomfoolery), though to what degree I can’t quite ascertain. Scientists’ riotous senses of humor aside, I would have expected dozens of more subtly troubling logical leaps to be followed by winky faces.

Given the absence of sufficient semicolon close parentheses, I worry about the misinformation generated by this study. The media has run wild with it in the past week, citing it widely with often far too little skepticism – an excellent example of a phenomenon I’ve recently started calling quantitative exceptionalism. A comment cardiologist Sanjay Kaul provided to CardioBrief sums up the dangers well: “This article highlights, with a touch of whimsy, caveats that challenge the interpretation of findings of observational studies. From the use of surrogate endpoints (based on biological plausibility and the results of preclinical studies) to the distinction between correlation and causation, confounding (whether the effect size is too large to be explained away by confounding), and the hypothesis-generating nature of the inferential process. Careful consideration of these issues is likely to help navigate through the labyrinth of misinformation and disinformation these types of studies are particularly prone to generating.”

Messerli is no stranger to the harmful effects scientific misinformation can have. Last year, he was quoted in a Wall Street Journal article about mistakes in scientific studies as one of a large number of doctors who (understandably) fell prey to an erroneous paper in the Lancet, another highly respected medical journal. Hundreds of thousands of patients were affected, and Messerli argued that the Lancet had a “moral obligation” to withdraw the paper. Granted, doctors around the world aren’t likely to begin writing prescriptions for dangerously high doses of chocolate based on Messerli’s note in NEJM any time soon, but the difference is one of magnitude rather than direction.

A few examples of things I found more troubling slash hilarious about Messerli’s note:

  • The use of the number of Nobel laureates as a surrogate endpoint for cognitive function is…how do I say it?…strange. In fact, the number of Nobel laureates probably has a lot more to do with a country’s wealth. As Nobel laureate Eric Cornell told Reuters, “National chocolate consumption is correlated with a country’s wealth and high-quality research is correlated with a country’s wealth…therefore chocolate is going to be correlated with high-quality research, but there is no causal connection there.”
  • Messerli writes: “Obviously, these findings are hypothesis-generating only and will have to be tested in a prospective, randomized trial.” Considering that countries in the study have at most a few Nobel laureates per million population, imagine the enormous expense, financial and otherwise, of such a trial. A properly controlled study would deprive millions of the joys of chocolate.
  • While the note warns in multiple places that causation has not been proven, its language repeatedly justifies causation based on tenuous logic. For example, Messerli writes that “it would take about 0.4 kg of chocolate per capita per year to increase the number of Nobel laureates in a given country by 1” and even refers to a “minimally effective chocolate dose.” He justifies such remarks only with references to prior studies linking cacao consumption and cognitive function, which are many leaps-of-faith removed from these conclusions.
  • Messerli writes but has no justification for this statement: “it is difficult to identify a plausible common denominator that could possibly drive both chocolate consumption and the number of Nobel laureates over many years. Differences in socioeconomic status from country to country and geographic and climatic factors may play some role, but they fall short of fully explaining the close correlation observed.”
  • The study appears to use chocolate rather than flavanol or cacao consumption figures, and the types of chocolate consumed in the studied countries varies significantly. Another gem from Cornell’s interview in Reuters: “It’s one thing if you want like a medicine or chemistry Nobel Prize, ok, but if you want a physics Nobel Prize it pretty much has got to be dark chocolate.” I wonder how considering less economically correlated forms of flavanols like green tea would change the results.

Is Messerli deserving of an Ig Nobel Prize for this gem? According to the Annals of Improbable Research, which awards the prizes annually: “Every Ig Nobel Prize winner has done something that first makes people LAUGH, then makes them THINK.”

Regardless, I’m left wondering what foods predispose you to becoming an Ig Nobel laureate. Foods that leave a funny taste in your mouth? Personally, I’m going to stick with salad.

Quantitative exceptionalism

Image courtesy of Wikimedia Commons

At its heart, the field of statistics deals with determining what inferences can be drawn from data. Causality, bias, significance, and experimental reproducibility are its lifeblood, and one doesn’t have to wander too many pages into a standard introductory statistics textbook before encountering these issues.

Most readers of this blog will not have too much trouble coming up with examples of real world situations in which the improper application of statistics can result in spurious conclusions. As a very simple example, if the average height of a population is estimated on the basis of a survey, and younger people (who tend to be shorter) have a lower response rate, the result may overestimate average height.

There is a substantial pop culture literature on such examples and how to avoid them (for instance, check out Darrell Huff’s classic How to Lie with Statistics or Joel Best’s Damned Lies and Statistics series). This phenomenon goes beyond statistics to all situations involving quantitative information or reasoning. Numbers and equations are apparently intoxicating to the uninitiated, like the narcotic lotus flowers of Greek mythology that reduced Odysseus’ crew to a state of peaceful apathy and nearly caused them to lose their way.

Too often, the curiosity and skepticism demonstrated by otherwise intelligent humans comes to a grinding halt when numbers are involved.

“Quantitative exceptionalism” is the widespread and often harmful belief that insights reached via quantitative means form an exceptional class. This term has both positive and negative connotations. Quantitative arguments are often assumed to be of high quality a priori, perhaps due to their relative inaccessibility, and those who employ them erudite. Humans are by nature fallible and what we do with numbers is subject to human error, yet people so often blindly trust quantitative arguments. Sometimes the errors are subtle; other times, not so much. The result is lower standards for scientific and mathematical rigor, with immense downstream impact.

Quantitative exceptionalism is widespread in academic, business, political, and popular discourse. In many scholarly disciplines, numerical data and quantitative arguments are given less scrutiny than their qualitative counterparts. In business and government, decisions are made on questionable calculations at an ever accelerating rate, fueled by a big data revolution that is a lot heavier on technology than it is on basic science. Educators in STEM fields could do more to encourage interrogation. Journalists using numbers and infographics could inquire more critically. Politicians…don’t get me started.

So don’t judge a number by its cover. Be curious. Be a skeptic. Avoid the lotus.

(For curious readers, the coinage “quantitative exceptionalism” is inspired by the terminology “American exceptionalism” and MIT linguist Michel DeGraff’s “Creole exceptionalism” [pdf].)

We’ll be talking a lot more about quantitative exceptionalism on this blog. In the meantime, share your thoughts or examples you’ve witnessed in the comments.

What do unstructured data and Santa Claus have in common?

Santa Claus as a huge balloon

Image courtesy of Bart Fields

There is no such thing as unstructured data. There, I said it.

Structure is inherent in the definition of data. No structure means no information means no data. Like “clearly misunderstood,” “unstructured data” is an oxymoron.

Some have proposed “semi-structured data” to overcome this logical issue, but this alternative is no less discriminatory. Whatever part of the data lacks structure has no informational content and is thus not data. Contradiction persists.

“Big data” is generally refers to data whose large size or structure or rate of change or complexity contributes to the difficulty of working with it in some way. “Unstructured data” refers to the subset of big data for which structure is part of the problem. These phrases are subjective and point to shortcomings in tools or users rather than something inherent about the data itself. They are highly context dependent and, as such, are often misused and misunderstood.

Which causes miscommunication. Unstructured communication, if you will.

Please regard this as a cease and desist letter for the use of the phrase “unstructured data” unless in a defamatory or humorous context (warning: it’s not that funny, either). We have plenty of words to describe specific instances of The Data Formally Known as Unstructured: “text,” “non-relational,” “the Web.”

If you absolutely need a term, perhaps “differently structured data” would be more somewhat more palatable to my fellow pedants. Another possibility, “multi-structured data,” is beginning to gain some momentum. Just keep the context crystal clear and we won’t come looking for you.

P.S. Apologies to any children whose Yuletide dreams were just crushed.


Image courtesy of Argonne National Laboratory

Google will learn just a tiny bit more about me (and you, the reader) from this post, enabling the search engine giant to (probabilistically) increase its bottom line through better targeted advertisements. You’re welcome, Google.

A hugely transformative data revolution is upon us. Machine readable information is being generated and captured at an astounding and rapidly accelerating rate. At the same time, a ballooning army of alchemists and applications attempt to transform it into value of various forms. Which results in even more data. Kaboom.

On the data generation side are technologies that digitize and enable the creation of new digital information: the vast human-generated data factories of the internet, increasingly sophisticated devices in our homes and handbags, and sensors galore in places most people don’t imagine.

On the value generation side are data scientists (also in places most people don’t imagine), enthusiasts, knowledge workers, and an ever expanding array of hardware and software.

At the center of the data revolution are quantrepreneurs, who innovate new ways to generate, capture, transform, and wring value from data, linking the two sides and propelling the revolution forward, evolving the information age into the age of actionable insight. This is their story.

Telling the full story of this revolution necessitates touching on many topics: data (of course), science, technology, engineering, math, business, innovation, entrepreneurship, privacy/transparency and the law, design, storytelling, technology, statistics, and more. Content will be wide ranging and will include case studies, opinions, thought experiments, predictions, rants, musings, and practical advice.

Non-specialists are welcome. In fact, this blog is really for you. Despite the recent explosion of online chatter about the data revolution, many sources still make it seem like magic. It’s not magic. Data Bitten aims to provide a voice that is significantly underrepresented in the conversation, by taking a first principles approach to the data revolution. We’ll demystify and debunk. We’ll be skeptical. We’ll expose the magic for what it is, simply good science or iterative engineering or a smart idea. The goal is to make this world accessible to a wider audience, and through a focus on the practical to enable and encourage greater understanding and participation.

Are you ready to get bitten by the data bug?

« Previous Page

  • About

    Data Bitten aims to tell the story of the data revolution. More to come.

  • Stay connected