What kind of p-values should we expect from true effects?

There has been a lot of discussion around what is mathematically, theoretically possible in terms of p-values. If you have .01 < p < .05 is that mathematically possible for true effects? Yes. Can you get the same thing several times in a row? Mathematically speaking it’s possible, even if unlikely. But I would like to skip over the math and talk about the real world.

What do we see in preregistered marketing studies that don’t show signs of p-hacking?

I try to keep a list of preregistered studies that don’t show signs of p-hacking. One thing I’ve noticed is that in marketing, you have two kinds of preregistered studies:

  1. Likely p-hacked. In these papers nearly all of the p-values are p > .01. The key construct is not well-defined and shifts from study to study. The dependent variable changes frequently. Mediators and moderators and other model complexity is found in every study. These papers never really test the same thing twice despite consistent use of the phrase “we replicated.”
  2. Probably not p-hacked. There is a second kind of preregistered paper in marketing that has clearly defined constructs, consistent replication of the same relationships with the same variables modeled. The key to identifying this type of study is consistency. And the models are less complex. These studies tend to have many or most p-values be p < .001.

On my list I only recorded the second category (not p-hacked) and disregard the first (p-hacked). The list hasn’t been updated in a while and several good ones have come out. If you want me to add one let me know. You can see the critical p-values for each of these papers below or see the full list.

Notice that nearly all of the p-values are p < .01 and a lot of them are p < .001. Critical p-values are taken from the core statistical test of each study–usually a hypothesis test. There is a lot of information on the P-curve website that can help you choose the most appropriate test.

What do we see in successful replications of marketing studies?

In marketing up to this point, only five studies have ever withstood the scrutiny of direct, high-powered, preregistered replication. Let’s consider all five:

Replication targetReplication studyEffectStatistical test in original studyNotes
Tully, Hershfield and Mevis (2005) Study 5O’Donnell (2021)As predicted, participants who were asked to consider their financial constraints were more likely to prefer the material options than were participants in the control condition .F(1, 375) = 10.02, p = .002
Cook and Sadhegein (2018) Study 3O’Donnell (2021)There was a marginally significant three-way interaction among perceived consequences (loss/gain), lending options,and liquidity, illustrating
how individuals who were in a loss mindset, did not have any
other lending options, and were without liquidity (i.e., “the triple
scarcity effect”) were negatively influenced (H5).
F(1,189) = 3.00, p = .08In this case, the original seems inconclusive since they didn’t get a significant result, nor did they do contrasts. In the replication they also didn’t do any contrast tests.
Cian, Longoni & Krishna (2020) Study 2Data Colada Blogand planned contrasts revealed that the progression ad was more credible than both the before/after adF(1,210)=7.13, p=.008
Klink (2000) Study 1Motoki & Iseki (2022)The results suggest that products with brand names containing front vowels, as
opposed to back vowels, are perceived as smaller, lighter (relative to darker), milder,
thinner, softer, faster, colder, more bitter, more feminine, friendlier, weaker, lighter
(relative to heavier), and prettier.
13 DVs all p < .001
Shrum et al (2012) Studies 1a, 1b & 1cMotoki & Iseki (2022)particular words will be preferred as brand names when the phonetic connotations of the words are consistent with the product attributes.F(1,367)=63.87, p<.001

Note that there is only one study that “replicated” that did not have p < .01. And that is the study that I had previously flagged because the replication authors indicated a successful 3-way interaction but said nothing about the shape of the interaction. It could be in the wrong direction in several different ways given that it’s a 3-way.

What’s the solution?

Given the severity of the problem, I’d propose that any marketing paper should be published with a table that shows the critical statistical test for each study along with a 3-digit p-value. It could be something like this:

StudyHypothesisCritical testp-valueEffect size (d)
1H1: Participants asked to consider their financial constraints will be more likely to prefer the material options relative to the control conditionF(1, 375) = 10.02p=.002d=.3

That would allow readers, reviewers and editors to quickly tell if the paper is likely to have been p-hacked from a quick glance.

Other solutions proposed by others would also help: Preregistration, better training, registered reports, etc. I have noticed, however, a problem with p-hacking of preregistered studies in marketing. Preregistration can easily be misused.


Replicable research tends to be p < .01. If you have a full paper full of p > .01 it’s highly unlikely to replicate. The most likely reason? Unintentional p-hacking. One of the things that would help the most in my opinion is inclusion of a table that shows the critical hypothesis tests and corresponding p-values for each study.

Related Posts






Leave a Reply

Your email address will not be published. Required fields are marked *