
We’ve known for a while that marketing studies are consistently failing to replicate. What could be at the root of these replication failures? Some have posited that it’s due to marketing’s changing context. Others have claimed it is because the original studies were mostly false positive results (likely due to p-hacking). More and more evidence has come out that the original studies are unreliable as indicated by the p-values reported in the original studies. Here is a look at the evidence:
In marketing, p-values are generally too close to p = .05
Multiple research teams have used a bias detection tool called “Z-curve” to show that p-values in marketing are generally way too high to be credible. Z-curve is a tool that derives multiple metrics (z-score curve, replicability, etc.) directly from reported p-values. The first to use z-curve to call out marketing in this way was Uli Schimmack, the innovator of the tool. In a blog post he claimed: If Consumer Psychology Wants to be a Science It Has to Behave Like a Science. One critique of Uli’s analysis is that he used mechanically scraped values, rather than manually selecting core hypothesis tests.

However, Krefeld-Schwalb and Scheibehenne (2022) used z-curve both ways (automatically scraped and manual harvesting of core hypothesis tests) and obtained similarly poor results both ways. They also showed that the poor pattern of p-values found in marketing has not improved since 2011 in response to the replication crisis, and that as sample sizes have increased due to convenience of online samples, effect sizes have shrunk.
In their supplemental materials, Dougherty and Horne (2022) found that the field of marketing fared very poorly, again in terms of scraped p-values a la z-curve, compared to every field of psychology (below left). And not only are the p-values much worse than other fields but they found a negative relationship between journal impact factor and quality of reported p-values (below right). So the message to early career marketing researchers is this: “you must p-hack to get into an ‘A’ journal.”


This problem of too-high p-values in marketing has also been shown in the following contexts:
- An analysis of Mturk studies in which marketing fared poorly relative to every other field examined
- Nudge interventions: (a) No reason to expect large and consistent effects of nudge interventions, (b) No evidence for nudging after adjusting for publication bias, (c) Left-truncated effects and overestimated meta-analytic means
- What works to increase charitable donations?
- Message persuasiveness
This problem has also been shown in several subdomains that heavily overlap marketing:
In marketing, confidence intervals for mediation tests are generally too close to zero
Mediation tests in marketing also show a highly suspicious pattern. Rather than having a bell-shaped curve that slightly overlaps zero as we would expect, they are distributed as close as possible to zero. This is not consistent with well-powered studies but closely matches low powered and null studies. The distribution of confidence intervals reported in marketing compares very poorly with that of Journal of Personality and Social Psychology. The chart below shows how close simulated mediation test confidence intervals are to zero in aggregate (colored) vs reported results (black outline).

Conclusion

The problem with low replicability in marketing can be traced back to type I error in the original studies. It is easy to obtain false results when one is running tests continuously attempting to obtain p < .05. Bear in mind that statistical inference is a tool to *test* an idea not to *find* an idea.