Do Peer Reviewers Prefer Significant Results?

I’ve long been writing about issues in how science is communicated and printed. Just one of the most effectively-acknowledged concerns in this context is publication bias — the tendency for final results that confirm a hypothesis to get printed a lot more easily than those that really don’t.

Publication bias has lots of contributing factors, but the peer assessment approach is extensively found as a vital driver. Peer reviewers, it is extensively considered, have a tendency to appear a lot more favorably on “positive” (i.e., statistically considerable) final results.

But is the reviewer choice for beneficial final results truly real? A not long ago printed examine suggests that the influence does exist, but that it’s not a substantial influence.

Scientists Malte Elson, Markus Huff and Sonja Utz carried out a intelligent experiment to determine the effect of statistical importance on peer assessment evaluations. The authors had been the organizers of a 2015 convention to which researchers submitted abstracts that had been issue to peer assessment.

The keynote speaker at this convention, by the way, was none other than “Neuroskeptic (a pseudonymous science blogger).”

Elson et al. developed a dummy abstract and had the convention peer reviewers assessment this artificial “submission” together with the serious ones. Just about every reviewer was randomly assigned to get a variation of the abstract with both a considerable end result or a nonsignificant end result the facts of the fictional examine had been or else similar. The last sample size was n=127 reviewers.

The authors do go over the ethics of this a little bit strange experiment!

It turned out that the statistically considerable variation of the abstract was presented a higher “overall recommendation” rating than the nonsignificant just one. The change, roughly 1 stage on a scale out of ten, was statistically considerable, even though marginally (p=.039).

The authors conclude that:

We noticed some proof for a tiny bias in favor of considerable final results. At minimum for this individual convention, however, it is not likely that the influence was huge sufficient to notably have an impact on acceptance costs.

The experiment also examined no matter whether reviewers had a choice for primary experiments vs. replication experiments (so there had been four variations of the dummy abstract in complete.) This disclosed no change.

Effects of significant vs. nonsignificant results

(Credit history: Elson et al. 2020)

So this examine suggests that reviewers, at minimum at this convention, do indeed choose beneficial final results. But as the authors acknowledge, it’s really hard to know no matter whether this would generalize to other contexts.

For illustration, the abstracts that had been reviewed for this convention had been restricted to just three hundred terms. In other contexts, notably journal report evaluations, reviewers are delivered with far a lot more information and facts to foundation an belief on. With just three hundred terms to go by, reviewers in this examine could have compensated attention to the final results just since there was not significantly else to judge on.

On the other hand, the authors note that the individuals in the 2015 convention could have been unusually knowledgeable of the dilemma of publication bias, and so a lot more most likely to give null final results a truthful hearing.

For the context of this examine, it is related to note that the division (and its management at the time) can be characterized as rather progressive with regard to open up-science ideals and practices.

This is absolutely real following all, they invited me, an nameless dude with a website, to speak to them, just on the power of my writings about open up science.

There have only been a handful of previous experiments working with related designs to probe peer assessment biases, and they commonly uncovered much larger effects. Just one 1982 paper uncovered a huge bias in favor of considerable final results at a psychology journal, as did a 2010 examine at a healthcare journal.

The authors conclude that their dummy submission method could be useful in the examine of peer assessment:

We hope that this examine encourages psychologists, as people today and on institutional amounts (associations, journals, conferences), to perform experimental study on peer assessment, and that the preregistered subject experiment we have noted may well provide as a blueprint of the variety of study we argue is important to cumulatively develop a rigorous understanding foundation on the peer assessment approach.