So cool that another fraudulent paper was discovered and outed. I noticed that there were conflicts between the author’s statement (he seems to blame his industry partner?) and other facts of the case. I just wanted to highlight the conflicts here because these are things that we need explained better if we are going to trust this author going forward. The author is Dan Ariely by the way. This refers to Data Colada #98.
First of all, let’s look at Dan Ariely’s statement:
The data were collected, entered, merged and anonymized by the company and then sent to me. This was the data file that was used for the analysis and then shared publicly. I was not involved in the data collection, data entry, or merging data with information from the insurance database for privacy reasons. [link]
But what are the conflicts with Dan’s statement?
- According to Excel meta data, Dan Ariely both created the Excel file and was the last person to modify it before sending it in its fraudulent form to coauthor Nina Mazar (she has the email still; footnote 14). Update: A member of Dan’s lab who joined years after the incident and has no inside information on what happened accurately pointed out that if a company sends you a .csv, and you save it as an .xls file, you will show as the author. Update 2: Apparently the csv file saved as xls doesn’t explain the problem because csv files don’t save font information and the fraudster used two different fonts (cambria and calibri) Thanks to Krzysztof Cipora for pointing that out.
- Dan Ariely admitted to miscoding a variable: “First, the effect observed in the data file that Nina received was in the opposite direction from the paper’s hypothesis. When Nina asked Dan about this, he wrote that when preparing the dataset for her he had changed the condition labels to be more descriptive and in that process had switched the meaning of the conditions, and that Nina should swap the labels back. Nina did so.” (footnote 14). You can’t miscode a variable if someone else does all the data work and you didn’t touch it.
This is all from footnote 14: Dan Ariely emailed an excel file to Nina Mazar in which the effect was opposite from what they expected (see right). When she contacted him about it, he said, “when preparing the dataset for her he had changed the condition labels to be more descriptive and in that process had switched the meaning of the conditions, and that Nina should swap the labels back.” That means that Dan Ariely had gone through and mislabeled every single value of column A (13,488 values). Of course with Excel there are ways to do it in bulk. He didn’t literally go through and do it one at a time, but the point remains that he admits to changing every single value of the variable, and changing it so that it’s incorrect. He manipulated the data and he failed to make a note of that in his statement. Perhaps he didn’t remember. Perhaps there is a harmless explanation. For now, I still consider it to be a conflict.