Academic journals and the press regularly serve
up fresh helpings of fascinating psychological
research findings. But how many of those experiments
would produce the same results a second time around?
According to work
presented today in Science, fewer than
half of 100 studies published in 2008 in three top
psychology journals could be replicated
successfully. The international effort included 270
scientists who re-ran other people's studies as part
of The
Reproducibility Project: Psychology, led by
Brian
Nosek of the University of Virginia.
The eye-opening results don't necessarily mean
that those original findings were incorrect or that
the scientific process is flawed. When one study
finds an effect that a second study can't replicate,
there are several possible reasons, says co-author
Cody Christopherson of Southern Oregon
University. Study A's result may be false, or Study
B's results may be false—or there may be some subtle
differences in the way the two studies were
conducted that impacted the results.
“This project is not evidence that anything is
broken. Rather, it's an example of science doing
what science does,” says Christopherson. “It's
impossible to be wrong in a final sense in science.
You have to be temporarily wrong, perhaps many
times, before you are ever right.”
Across the sciences, research is considered
reproducible when an independent team can conduct a
published experiment, following the original methods
as closely as possible, and get the same results.
It's one key part of the process for building
evidence to support theories. Even today, 100 years
after
Albert Einstein presented his general theory of
relativity, scientists regularly repeat tests of its
predictions and look for cases where his famous
description of gravity does not apply.
"Scientific evidence does not rely on trusting
the authority of the person who made the discovery,"
team member
Angela Attwood, a psychology professor at the
University of Bristol, said in a statement "Rather,
credibility accumulates through independent
replication and elaboration of the ideas and
evidence."
The Reproducibility Project, a community-based
crowdsourcing effort, kicked off in 2011 to test how
well this measure of credibility applies to recent
research in psychology. Scientists, some recruited
and some volunteers, reviewed a pool of studies and
selected one for replication that matched their own
interest and expertise. Their data and results were
shared online and reviewed and analyzed by other
participating scientists for inclusion in the large
Science study.
To help improve future research, the project
analysis attempted to determine which kinds of
studies fared the best, and why. They found that
surprising results were the hardest to reproduce,
and that the experience or expertise of the
scientists who conducted the original experiments
had little to do with successful replication.
The findings also offered some support for the
oft-criticized statistical tool known as the
P value, which measures whether a
result is significant or due to chance. A higher
value means a result is most likely a fluke, while a
lower value means the result is statistically
significant.
The project analysis showed that a low P
value was fairly predictive of which psychology
studies could be replicated. Twenty of the 32
original studies with a P value of less
than 0.001 could be replicated, for example, while
just 2 of the 11 papers with a value greater than
0.04 were successfully replicated.
But Christopherson suspects that most of his
co-authors would not want the study to be taken as a
ringing endorsement of P values, because
they recognize the tool's limitations. And at least
one P value problem was highlighted in the
research: The original studies had relatively little
variability in P value, because most
journals have established a cutoff of 0.05 for
publication. The trouble is that value can be
reached by
being selective about data sets, which means
scientists looking to replicate a result should also
carefully consider the methods and the data used in
the original study.
It's also not yet clear whether psychology might
be a particularly difficult field for
reproducibility—a similar study is currently
underway on cancer biology research. In the
meantime, Christopherson hopes that the massive
effort will spur more such double-checks and
revisitations of past research to aid the scientific
process.
“Getting it right means regularly revisiting past
assumptions and past results and finding new ways to
test them. The only way science is successful and
credible is if it is self-critical,” he notes.
Unfortunately there are disincentives to pursuing
this kind of research, he says: “To get hired and
promoted in academia, you must publish original
research, so direct replications are rarer. I hope
going forward that the universities and funding
agencies responsible for incentivizing this
research—and the media outlets covering them—will
realize that they've been part of the problem, and
that devaluing replication in this way has created a
less stable literature than we'd like.”
Read more:
http://www.smithsonianmag.com/science-nature/scientists-replicated-100-psychology-studies-and-fewer-half-got-same-results-180956426/#w8LtaV5FJSlahVr4.99
Give the gift of Smithsonian magazine for only $12!
http://bit.ly/1cGUiGv
Follow us: @SmithsonianMag on Twitter