Personality Psychology Has a Serious Problem (And so Do Many Other Areas of Psychology)
Brent W. Roberts
Recently, a spat of integrity issues has struck our top journal, JPSP. Daryl Bem published an article showing evidence for the existence of ESP, which has caused much consternation and embarrassment. The more problematic occurrence, or the lack thereof, was the fact that JPSP refused to publish papers demonstrating the inability to replicate Bem’s findings, as the journal “does not publish replications”. Then, of course, an esteemed social psychologist, who has been often published in JPSP, was caught faking data (for another journal, thank goodness). Nonetheless, the question then arises whether anyone ever replicated his JPSP work, which, given JPSP’s policy, would never see the light of day anyway. Unknown to most people was a slightly older kerfuffle, in which the social network analyses of some of our most esteemed colleagues that was published in JPSP was called into question. Once again, JPSP refused to even review the paper, pointing out that many of the original paper’s findings were incorrectly interpreted (e.g., Lyons, 2011).
As a personality psychologist, one could potentially discount these events as problems with the first section of JPSP, where most of our social psychological brethren reside. I think this would be a mistake for two reasons. First, as go the social psychologists, so do the personality psychologists. Personality psychology no longer exists as a stand-alone field separate from social psychology. This means that social psychology’s issues are our issues too, as most other guilds in psychology tend to lump us together. Second, and more importantly, the issues that come to light with these papers and the attempts to correct them are just as applicable to the research found in the third section of JPSP as well as all of the journals dedicated to personality psychology. That is to say, their methods are our methods, their publication processes are our processes, and our integrity is just as impugned as that of the social psychologists because of these facts.
I believe that the basic problem facing personality psychology is one of evolutionary sociology. In the words of one of my esteemed colleagues, a publication is only worthy if “it shows something” (i.e., costly signaling—“look, I’m a fit researcher.”). In no uncertain terms, “showing something” means showing that something is statistically significant (i.e., not zero). Thus, to be successful in our field we must publish a string of articles that reveal statistically significant results, even if the point of the article is fanciful. I believe that this has led to widely accepted practices that undermine our ability to create a foundation of reliable scientific findings. Moreover, my current concern is that if we fail to address our issues constructively, the effect will be to further marginalize our field at a time when many important institutions, such as the National Institutes of Health, already question the usefulness of our scientific contributions.
To this end, I have done two things.1 First, below, with the input of many colleagues, I diagnose the problem by documenting some of the practices that I think are the basis to our integrity issue. The list is only provisional. Please add to it or modify it to your liking. Second, I have proposed one potential solution to the problem; a new journal that will be built on simple practices and help to overcome our current inertia to change our ways. Preferably, our current journals would change their policies to endorse these practices as well. I believe that if a journal is created that uses these practices and/or an existing journal adopts these practices, then it will, with time, become the most important outlet for our science, trumping even JPSP.
Problematic Practices in Psychological Science
Before I dive right into the laundry list of problematic practices, let me confess. I’ve committed many of the sins documented below myself. This list is not intended to shame our field as much as get us all to confess to the same sins.
1. Null Hypothesis Significance Testing (NHST): We compare our results to the null hypothesis, despite the fact that we are almost never testing something that has not been tested before. We also misuse NHST, for example by comparing one result that is statistically significant from zero to one that is not and inferring that these two effects are different from each other. This was one of the major points of the critique of Cacioppo, Fowler, and Christakis (2009). There are many other sins associated with NHST that better writers have described. Jacob Cohen should be required reading for most of us, for example (Cohen, 1990). Despite many luminaries pointing out the failings of our addiction to NHST, we still hold dearly to NHST like a crutch. We are smarter than this and we should show it.
2. Not valuing the null hypothesis. One of our explicit goals in personality science is to produce “knowledge”. Our system for creating knowledge is showing that something is “statistically significant”. This creates a situation where we do not value null results, which are intrinsically necessary for creating knowledge because null results show us where our ideas and/or constructs do not work. Without null findings, our confirmatory findings mean nothing. Personality psychologists should be especially aware of this issue, as one of the hallmarks of construct validity is discriminant validity. If one of our measures correlates with everything, then we think it is invalid. Given our sensitivity to this basic issue, how then is it the case that we tolerate the practice and valuing of only significant findings (e.g., only convergent correlations)? A field of only confirmatory findings, like a measure that shows no discriminant validity, produces a cacophony of findings that add up to nothing.
3. Data churning: Running repeated experiments or studies until you get a statistically significant finding; surfing the correlation matrix in search of a statistically significant finding. We are not held accountable for these practices, since we never report them because our journals want us to “show something” (i.e., something statistically significant). In fact, admitting to null findings in a manuscript is not only punished with negative reviews, but even when a paper is reviewed positively, reviewers and editors often recommend trimming the null findings from the paper. Why?
4. Not replicating. Most discussions of the scientific method put replication as the cornerstone of scientific practice. In personality psychology, and most other areas of psychology, we actively devalue direct replication. The fact that JPSP, and most other major journals (e.g., Journal of Personality, Journal of Research in Personality), devalue direct replications means that our system cannot directly correct itself. If we cannot correct our mistakes, then we don’t practice science as much as publish something like a blog that only looks systematic.
5. Not reporting the failure to replicate our own research. Not publishing others’ failures to replicate our own research.
6. Peeking: Checking data as it is being collected and discovering “significant” effects along the way. According to a nice write-up by Rich Lucas, this inflates the Type 1 error rate quite nicely (see also Simmons, Nelson, & Simonsohn, 2011).
7. HARKing: Hypothesizing After the Results are Known.
8. Data topiary: The process of pruning insignificant findings or findings that contradict hypotheses followed closely by changing one’s hypotheses (see HARKing).
9. Launching Outcome Fragmentation Grenades (OFGs). Collecting so many outcomes that something is bound to hit.
10. Betting against the house. Running underpowered studies, which means you have a 50:50 chance of finding an effect even if it exists. This leads to inconsistent and incoherent research findings. Do a power analysis before running your study. Even SPSS offers power analysis now.
These are just the 10 practices that are commonly practiced in our field and many other fields (e.g., Munafo & Flint, 2010). I also believe that it is these practices that led to Bem’s article not only being conducted, but also being reviewed positively and being published. His article is a tour de force of our current practices. If you accept our practices, then you would have to accept his paper.
A Modest Proposal: The Journal of Reproducible Results in Personality Psychology
There are many potential ways of addressing these integrity issues. Personally, I would not condone shaming our field into submission or calling for high levels of regulation in order to curtail these practices. If past articles saying similar things had worked, we wouldn’t be committing these sins now. The problem is better addressed by changing the incentives for publication. If we change the publication process so as to reward different practices, then it is quite possible that we would more willingly change our modal approach to research and publication. I also believe that many of these practices can be addressed with one simple policy change. We should replicate our research and reward researchers with publications for doing so. That way, researchers can peak, HARK, churn, and surf as much as they want, as long as they show that the resulting findings replicate.
With these ideas in mind, I have spelled out some provisional policies that could be implemented in this hypothetical journal, or, better yet, they could be implemented in any other journal interested in seriously addressing the integrity issues facing our field.
1. All original submissions must contain at least two studies. An original study and a direct replication of the original study. Conceptual replications will not suffice. The replication has to be identical to the first study.
2. Any subsequent study that directly replicates the method and analyses used in the original set of studies, regardless of the results, will be published. The subsequent studies will be linked to the original study.
3. When evaluating results, researchers will present point estimates, preferably in the form of an effect size indicator, and confidence intervals around that point estimate. The use of null hypothesis significance testing should be minimized.
4. All data analyzed for the published studies will be submitted as an appendix and made available to the scientific community for re-analysis. This will permit the most fundamental form of replication to occur, in which others can re-examine the original data and possibly analyze it in a better way.
5. IRB documentation, including the date of first submission, subsequent re-approvals, and number of participants run under the auspices of the IRB submission, will be provided as an appendix. If there is a discrepancy between the number of participants run under the study IRB and the published research, an explanation will be required and recorded in the published manuscript. Trust me, I think this is onerous too. That said, if we really want to keep ourselves honest about the work we do, we must be honest about all of the research we’ve done in pursuit of an idea. The only way to avoid churning through studies and fields of correlations until we hit a statistically significant effect is to fess up about all of the research we conduct.
6. Similarly, documentation of all of the variables collected in the study and some explanation for why the particular set of variables used in the published article were selected. This might diminish the use of OFGs.
I think it is clear that either starting this type of journal or changing the policies of an existing journal would be difficult. Publishers would hate this approach, as it would actively undermine the indices used to hold journals in high esteem, such as the impact factor and the rejection rate. That said, we have to seriously examine our fealty to these indices as they help to promote the problematic policies exhibited by JPSP and other journals—policies that we follow like lemmings.
I also readily acknowledge that much work would not be published in this journal. For example, I do a lot of longitudinal research, which is almost impossible to directly replicate. Therefore, my work would seldom find its way into this type of journal. I’m not too worried about this fact though. There are many, many journals out there and most of them don’t care about replication anyway.
I have penned this missive because I believe that the alternative is to continue blithely down the path of increasing irrelevance as our work becomes easier and easier to dismiss by other scientists, the federal government, state governments (seen Florida lately?), and ultimately, the general public, who directly or indirectly support our efforts through their tax dollars. We need to rectify the practices in our field that undermine the integrity of our work and the reputation that we work so hard to establish and maintain. I also think that personality psychology is an optimal place to initiate some form of change. We are a small group, and therefore more nimble than many of the other guilds in psychology. Changing the publishing guidelines for one of our journals or starting a new one (on-line maybe?), would not require tremendous effort. And, finally, why not? Why don’t we lead on this one instead of waiting for someone else to do the right thing?
References
Bem, D.J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100, 407-425.
Cacioppo, J.T., Fowler, J.H., & Christakis, N.A. (2009). Alone in a crowd: The structure and spread of loneliness in a large social network. Journal of Personality and Social Psychology, 97, 977-991.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.
Lyons, R. (2011). The spread of evidence-poor medicine via flawed social-network analysis. Statistics, Policy, and Policy, 2.
Munafo, M.R., & Flint, J. (2010). How reliable are scientific studies? The British Journal of Psychiatry, 197, 257-258.
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-Positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science.
1 To be clear, this was not a solo effort. I received a lot of feedback along the way that I am grateful for. That said, many of my points are controversial, at least to my close colleagues. Therefore, I will take full responsibility for the content of the essay.
Have a reaction to this feature? Submit your brief comments to arpnewsletter@gmail.com. A selection of comments will be published in the next issue of P!