"Replication crisis" in science

VorZakone · Jul 24, 2022

Comment from r/medicine.

I hate to be the one to say this.

But is it any wonder that the general public has lost so much faith in science? And other institutions?

We all know that even in the face of something like this, science is still better than the alternatives. But for everyday people going about their lives, how are they supposed to know that? How are they supposed to trust an institution that preaches "follow the data" then can't replicate even half of the data they want everyone to follow? And the replication crisis is even worse in some other academic fields.

We are better than the alternative. But we are also under greater scrutiny because we claim to have science on our side. So it falls to us to make sure that what we are practicing is science and not fabrication.

"Publish or perish" has got to go. We've taken the capitalist version of academia far past its ideal level and well into the stage where competitive pressure is crushing scientific integrity as a motivation. Our academic systems don't just allow fabricators to exist (as they always have), we are selecting for them by rewarding quantity over quality.

I don't know how to get rid of it, and I wish I did

nimic · Jul 24, 2022

Do you have the link?

VorZakone · Jul 24, 2022

nimic said:
Do you have the link?

nimic · Jul 24, 2022

The article is honestly a lot more interesting than the comment in response. Everyone should take a look, it's good.

https://www.science.org/content/art...mages-threatens-key-theory-alzheimers-disease

Murder on Zidane's Floor · Jul 25, 2022

The recent stuff about SSRIs comes to mind when I think of this.

VorZakone · Sep 4, 2022

Just stumbled on this. How true is this? They claim there is no evidence of a replication crisis.

We show that OSC's article contains three major statistical errors and, when corrected, provides no evidence of a replication crisis. Indeed, the evidence is also consistent with the opposite conclusion -- that the reproducibility of psychological science is quite high and, in fact, statistically indistinguishable from 100%. The moral of the story is that meta-science must follow the rules of science.

https://projects.iq.harvard.edu/psychology-replications

PedroMendez · Sep 4, 2022

VorZakone said:
Just stumbled on this. How true is this? They claim there is no evidence of a replication crisis.

https://projects.iq.harvard.edu/psychology-replications

There was a project (OSC) that ended around 2015, that tried to reproduce 100 published studies and they concluded that replication rates are problematic:

Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study.

The paper you referenced (Gilbert et al.) criticized the methodology of this project. They argue, that the authors of the OSC project didn't account for three reasons, that can explain the differing results. They argue, that if these three aspects are taken into account, the results wouldn't support the conclusion that reproducibility is an issue.

There are also responses to Gilbert et al., that argue that these authors themselves ignore various relevant aspects and so on. The original study has countless citations. Its hard to judge the validity unless you want to go down a rabbit-hole of technical details.

To give one example:
Gilbert et al argue, that various of the replication studies differed from the original studies and they listed various examples. Following a passage from their comment:

In fact, many of OSC’s replication studies differed from the original studies in other ways as well. For example, many of OSC’s replication studies drew their samples from different populations than the original studies did. An original study that measured American’s attitudes toward African Americans (3) was replicated with Italians, who do not share the same stereotypes; an original study that asked college students to imagine being called on by a professor (4) was replicated with participants who had never been to college; and an original study that asked students who commute to school to choose between apartments that were short and long drives from campus (5) was replicated with students who do not commute to school. What’s more, many of OSC’s replication studies used procedures that differed from the original study’s procedures in substantial ways: An original study that asked Israelis to imagine the consequences of military service (6) was replicated by asking Americans to imagine the consequences of a honeymoon; an original study that gave younger children the difficult task of locating targets on a large screen (7) was replicated by giving older children the easier task of locating targets on a small screen; an original study that showed how a change in the wording of a charitable appeal sent by mail to Koreans could boost response rates (8) was replicated by sending 771,408 e-mail messages to people all over the world (which produced a response rate of essentially zero in all conditions). All of these infidelities are potential sources of random error that the OSC’s benchmark did not take into account.

At first this might sound quite bad and fishy. Why would the people who replicate studies change the original protocols? The answer is, that reproduction isn't always as easy as it seems. Gilbert et al. omit that various of these changes were endorsed as improvements by the original authors of the studies, that were getting replicated. So changes in protocols aren't always a problem. Sometimes this would be a legitimate problem and sometimes changes in the protocols are warranted. One would have to look at every single study to evaluate if changes are a problem. Nobody has time for that. Yet, if I pick it as point of contention, that part should probably included and discussed fairly.

Gilbert et al raise the legitimate point, that reproduction studies can be equally challenging and flawed. They are never the "objective last word" and have to be scrutinized. The big picture is, that there are so many red flags about reproducibility, that their conclusion isn't convincing. A generous interpretation is, that they highlight some weaknesses of the OSC-study, that should be taken into account in future research.

VorZakone · Sep 4, 2022

PedroMendez said:
There was a project (OSC) that ended around 2015, that tried to reproduce 100 published studies and they concluded that replication rates are problematic:

The paper you referenced (Gilbert et al.) criticized the methodology of this project. They argue, that the authors of the OSC project didn't account for three reasons, that can explain the differing results. They argue, that if these three aspects are taken into account, the results wouldn't support the conclusion that reproducibility is an issue.

There are also responses to Gilbert et al., that argue that these authors themselves ignore various relevant aspects and so on. The original study has countless citations. Its hard to judge the validity unless you want to go down a rabbit-hole of technical details.

To give one example:
Gilbert et al argue, that various of the replication studies differed from the original studies and they listed various examples. Following a passage from their comment:

At first this might sound quite bad and fishy. Why would the people who replicate studies change the original protocols? The answer is, that reproduction isn't always as easy as it seems. Gilbert et al. omit that various of these changes were endorsed as improvements by the original authors of the studies, that were getting replicated. So changes in protocols aren't always a problem. Sometimes this would be a legitimate problem and sometimes changes in the protocols are warranted. One would have to look at every single study to evaluate if changes are a problem. Nobody has time for that. Yet, if I pick it as point of contention, that part should probably included and discussed fairly.

Gilbert et al raise the legitimate point, that reproduction studies can be equally challenging and flawed. They are never the "objective last word" and have to be scrutinized. The big picture is, that there are so many red flags about reproducibility, that their conclusion isn't convincing. A generous interpretation is, that they highlight some weaknesses of the OSC-study, that should be taken into account in future research.

But how valid is this claim?

Indeed, the evidence is also consistent with the opposite conclusion -- that the reproducibility of psychological science is quite high and, in fact, statistically indistinguishable from 100%

PedroMendez · Sep 4, 2022

VorZakone said:
But how valid is this claim?

If one accepts their methodological corrections, which I wouldn't, this claim would be true, if reproducibility is understood in a narrow way ("is there an effect"). They don't address, that the OSC study also found, that effect sizes are substantially smaller in the reproduced studies.

berbatrick · Sep 4, 2022

Replicate this :lol:

Tweet
— Twitter API (@user) date

Psychology is bad, but there's also a lot of nonsense in biology that is allowed to go through because of vast sample sizes rather than any actual science. (In this paper, for clarity, the authors are claiming that that distribution tilts towards one end, and those blue/pink are significantly different from a horizontal line...)

PedroMendez · Sep 4, 2022

berbatrick said:
Replicate this

Tweet
— Twitter API (@user) date

Psychology is bad, but there's also a lot of nonsense in biology that is allowed to go through because of vast sample sizes rather than any actual science. (In this paper, for clarity, the authors are claiming that that distribution tilts towards one end, and those blue/pink are significantly different from a horizontal line...)

skimming through the paper, this is a buzz-word paradise for someone without background knowledge. I am surprised that concepts like fluid vs crystallized intelligence are reputable concepts.

VorZakone · Feb 1, 2023

Just read about some study in which Dan Ariely was involved that turned out to be...fraudulent? Even taught in class!
https://www.buzzfeednews.com/article/stephaniemlee/dan-ariely-honesty-study-retraction

Tweet
— Twitter API (@user) date

Revan · Feb 2, 2023

The deeper I go into this, the more I get convinced that outside of math and physics, most of research papers are junk. The further you go from fundamental sciences, the more junk the science becomes.

Of course, science is better than any other alternative, no doubt there. But a large portion of science is fake as feck. Of course, in long term, the fake science papers influence the field less than the good papers, but still.

Murder on Zidane's Floor · Feb 2, 2023

Revan said:
The deeper I go into this, the more I get convinced that outside of math and physics, most of research papers are junk. The further you go from fundamental sciences, the more junk the science becomes.

Of course, science is better than any other alternative, no doubt there. But a large portion of science is fake as feck. Of course, in long term, the fake science papers influence the field less than the good papers, but still.

I have a feeling that it's been corrupted for a while given the financial incentives for certain research/studies to be undertaken and the careers that can be had etc. Once we made it like this, humans will always bend stuff to their favour.

Revan · Feb 3, 2023

Murder on Zidane's Floor said:
I have a feeling that it's been corrupted for a while given the financial incentives for certain research/studies to be undertaken and the careers that can be had etc. Once we made it like this, humans will always bend stuff to their favour.

Yep. The issue is that 'fame' is based on number of citations and h-index. The more famous someone is, the higher chance they have on getting grants. The more grants, the more postdocs and PhD students. Which in turn means more papers and thus more citations and a higher h-index. PhD students get pushed to death to get those two-three papers to give them the PhD, and no one really does long-term studying. It is always push to get the paper out, even if deep down they know that the paper has no impact at all.

Long are the days since people spent a decade on doing something. Nowadays it is mostly on working in a project for the next 3-6 months.

NB: My experience is mostly on the AI field where it is like this, but talking with other people, it is hardly much different in other fields.

VorZakone · Jul 12, 2023

Some Harvard academic caught fabricating?

https://www.theatlantic.com/science...esca-gino-harvard-research-retraction/674630/
https://www.vox.com/future-perfect/...esty-research-fraud-francesca-gino-dan-ariely

NotThatSoph · Jul 12, 2023

VorZakone said:
Just read about some study in which Dan Ariely was involved that turned out to be...fraudulent? Even taught in class!
https://www.buzzfeednews.com/article/stephaniemlee/dan-ariely-honesty-study-retraction

Tweet
— Twitter API (@user) date

Some more Ariely stuff.

Tweet
— Twitter API (@user) date

entropy · Jul 12, 2023

yikes. duke needs to cut all ties with him.

Jev · Jul 12, 2023

NotThatSoph said:
Some more Ariely stuff.

Tweet
— Twitter API (@user) date

Interesting, though a year old. Has there been anything linking Ariely to the alleged Gino fabrications?

Murder on Zidane's Floor · Jul 12, 2023

Revan said:
Definitely true in the field I work (Machine Learning / Computer Vision).

A lot of papers don't have code at all, which makes them non-reproducible by default. Many others have code, but good luck getting the results they claim in them. And then from the remaining, many are reproducible, but the results are not because of the main idea of the paper, but because of a lot of engineering to push the results past state-of-the-art. I know that I have done this for a couple of my papers, cause if not state-of-the-art means reject, and everybody is doing so, which makes it the only way to actually get your idea published. I think this is a syndrome of a broken (to some degree system), people are incentivised to publish as much as they can, not necessarily to really push the science forward. Postdoc positions and tenure-track positions are primarily based on the number of top-venue papers, as are the offers from the big tech companies. Getting meaningful research is a distant second.

Nevertheless, I think that the field has progressed massively because when there are so many top-venue papers each year, some of them are meant to be good. After all, the standards to publish in top venues are very high, and even if 99% of them in grand scheme of things don't do anything, the 1% is gonna push the science forward. There has also been a push to at least publish the code, which makes reproducibility better, and cheating in results more difficult (hard to cheat in results where your code is online). Nowadays, around 80% of top-venue papers have code online, just a few years ago, most of them didn't.

Saying that despite that I think this is a big problem in my field, it is far better than in the others. In medicine or social studies, some of the experiments are completely not reproducible. The venues not being double-blinded also means that at least some of the decision is based on the author's reputation, not on the science itself. With the results not being properly reproducible, it also gets easy to cheat. And at least in neuroscience (I also work a bit in the intersection between ML and neuroscience) the quality of some top-venue papers I read is completely appalling, Nature papers that I would have not accepted as Master thesis.

It is much better in theoretical fields though. For example, in rare occasions I check theoretical physics papers, they are top-notch (as far as I can understand read them). But then, I guess by definition, they cannot have reproducibility issues.

-------

TLDR: Most of the research there, despite being peer-reviewed and top-tier venue (in general, every field has just a few journals/conferences who are top-tier, the rest is somewhere between junk and not worth it to read), most of it is probably non-reproducible and generally useless.

Very interesting.

entropy · Jul 12, 2023

NotThatSoph said:
Some more Ariely stuff.

Tweet
— Twitter API (@user) date

what is your take on behavioral economics as an area of study?

Tweet
— Twitter API (@user) date

neverdie · Jul 12, 2023

entropy said:
what is your take on behavioral economics as an area of study?

Not asking me, but economics is shaman-like smoke blowing. Beyond the basic structural stuff, which is very useful, you're dealing with probabalist models and modelists who do not have a clue what they are doing even as they are supposed top-tier economists (they don't know this). It's an art, not a science, and people often forget that. The only scientific part of it is the structural, logical, stuff, which precludes essentially every single economic model ever devised.

Behavioral economics is literally a joke. Economics itself, however, is an art concealed as a science in 90% of all cases I've ever witnessed. Fortune tellers with statistics. Weather-men with better probability charts. Now, depends whether you're into economic, sociological, analysis at structural levels, which, logically, is about as scientific as it gets or into simply stock-market, quant, type stuff which is wind-pissing based on whichever shaman has the best run of winning horses.

NotThatSoph · Jul 12, 2023

entropy said:
what is your take on behavioral economics as an area of study?

I think it's at times interesting and potentially useful, but that its use is to tweak standard models rather than as a standalone thing. It's not the marginal revolution mk.2, it's not going to change everything, and the initial hype and claims were too much. The pop sci stuff that has reached the mainstream, both in book sales and policy wise, is largely bullshit. Obviously fraud like this Ariely thing is very damaging for credibility, and then you have Nudge which is crap science. I'd consider Ariely more psychology than economics, but his stuff was important for the early development of BE, and of course Thaler (Nudge) won the Nobel Memorial Prize, so it's not a good look.

NotThatSoph · Jul 12, 2023

Jev said:
Interesting, though a year old. Has there been anything linking Ariely to the alleged Gino fabrications?

They co-authored Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end (2012), which was one of the things that got the ball rolling.

Pogue Mahone · Mar 21, 2024

So this whole “crisis in scientific research” thing seems to be getting worse and worse.

Link

Watchdog groups – such as Retraction Watch – have tracked the problem and have noted retractions by journals that were forced to act on occasions when fabrications were uncovered. One study, by Nature, revealed that in 2013 there were just over 1,000 retractions. In 2022, the figure topped 4,000 before jumping to more than 10,000 last year

The startling rise in the publication of sham science papers has its roots in China, where young doctors and scientists seeking promotion were required to have published scientific papers. Shadow organisations – known as “paper mills” – began to supply fabricated work for publication in journals there.

Can only assume that LLM technology is going to supercharge these paper mills, making the problem even worse.

moses · Mar 21, 2024

Just what we need as inch into our post-truth era.

Pogue Mahone · Mar 21, 2024

moses said:
Just what we need as inch into our post-truth era.

Exactly!

rimaldo · Mar 21, 2024

moses said:
Just what we need as inch into our post-truth era.

you don’t need scientists. just subscribe to my youtube channel ‘rimaldo’s red-hot reckons.’

VorZakone · Mar 26, 2024

Tweet
— Twitter API (@user) date

"Replication crisis" in science

What would Kenny G do?

something nice

What would Kenny G do?

something nice

You'd better not kill Giroud

What would Kenny G do?

Acolyte

What would Kenny G do?

Acolyte

Renaissance Man

Acolyte

What would Kenny G do?

Assumptionman

You'd better not kill Giroud

Assumptionman

What would Kenny G do?

Full Member

Full Member

Full Member

You'd better not kill Giroud

Full Member

Full Member

Full Member

Full Member

The caf's Camus.

Can't We Just Be Nice?

The caf's Camus.

All about the essence

What would Kenny G do?