Following a failure to replicate, “Super Size Me” paper turns out to be full of numerical inconsistencies

When Tunca, Ziano and Xu (2022) set out to replicate Dubois et al. (2012), they had no idea that their efforts would lead to the original authors requesting retraction of the original study. The original study claimed that consumers view larger food options to be a status signal. For example, consider the picture above. Look at the size of coffee that guy ordered! Does that make him seem higher status? It should according to the original article. The replication team was skeptical but what made it so interesting was that the effects reported in the original paper were very large and very consistent. Large, consistent effects replicate. They just do. So what was going on here?

The first clue

As the replication team went through and did their due diligence before running the replication, certain numerical inconsistencies began to emerge. Most notably, the effect sizes from the original study (Cohen’s d in the table below) were not consistent with the means and standard deviations they should have been derived from. The recalculated effect sizes were much higher than what was reported in the original study. Some of the numerical inconsistencies observed are in the table below.

Table 3 from Tunca, Ziano & Xu 2022

This was strange but at this point it could still be explained away as sloppiness.

My interest was piqued

I got involved after the replicators publicized their findings. They had failed to replicate the original study. I was first struck by the consistency of effects reported in the paper that didn’t replicate. Most replication failures in marketing can be easily explained by p-hacking in the original study, but not this one. Nothing really happened at that point because I got busy and moved on. Nearly a year later, however, they called attention to it again as it as it had been published by Meta-Psychology. Looking at it with fresh eyes, I was struck by the large effect sizes and thought “how could such large effects not replicate.” When I called attention to the large effect on Twitter (d = 1.1), one of the replication authors pointed out that it was miscalculated and actually should have been d = 1.5 if the effect sizes had been properly derived from the means and standard deviations. At this point, the paper started to look like a crime scene to me. 🚨 This definitely warranted further investigation.

When I dug into the paper I found quite a few more numerical inconsistencies. In fact, it was difficult to find any numbers in the paper that were consistent with each other. Nearly every number in the paper appeared to be inconsistent with every other number. The first author, who collected all of the data and did all of the analyses, was able to provide a small amount of the participant data which helped to resolve some minor issues with Study 3, but the bulk of the problems remained.

All together I identified five categories of numerical inconsistency:

Numerical inconsistencyMethod of discovery
1Effect sizes not consistent with means and standard deviationsTunca, Ziano & Xu 2022
2P-values not consistent with F and T statisticsDisclosed by original authors; pointed out to them by a reader in 2012
3Impossible means and SDs (GRIM/SPRITE issues)Charlton
4ANOVA not consistent with means and standard deviations in one studyCharlton
5Impossible mediation: (a) too much asymmetry (point estimate near the lower endpoint instead of the midpoint) and (b) indirect effect way too weak to fully mediate X-Y relationship; indirect effect numerically inconsistent with other pathsCharlton

Read the full report on the data anomalies >>

This is another wake up call for the field of marketing and other fields that are slow to adopt better scientific practices

This is a case study for the benefits of direct replication. If nobody tried to replicate it, people would keep citing this flawed study, graduate students would keep trying to the effect to work, etc. But thanks to the replication study we learned both (a) the replication team couldn’t get it to replicate, and (b) the original study was also deeply flawed in a way that merited retraction–or at least the original authors thought so. Seven months after their request to retract it still has not happened. Assuming that the Journal of Consumer Research can figure out how to do the right thing in this case, science can move forward and be better, thanks to replication.

Unfortunately, marketing journals are completely unwilling to publish direct replications. To this date, only one direct replication has ever been published in a marketing journal, and that journal has since been shut down. That means that marketing findings are never challenged by neutral third-parties. By design, this protects the careers and reputations of marketing scholars at the expense of scientific progress. Fortunately, many marketing studies have recently been replicated in blog posts or non-marketing journals and that’s how we know that the replication rate for marketing is abysmal. It’s also how we know about the problems in Dubois et al. (2012). Thankfully, Meta-Psychology was willing to publish the replication study. Hopefully, this triumph of direct replication will encourage marketing’s gate keepers to be more open to the practice of replication in the future.

It’s also a case for open data. Very little of the data was available from this study. The data that was available and provided to me actually did help resolve some minor concerns. If all the data was available by default we could have either (a) figured out what happened in this paper, or even (b) prevented whatever happened here from happening. When there is zero accountability, such as is the case with the current closed data model that marketing academia seems to prefer, there is obviously more room for things to go wrong.


Leave a Reply

Your email address will not be published. Required fields are marked *