Replication Crisis: Cultural Norms In The Lab

I recently listened to an excellent episode of the "Stuk Rood Vlees" ("Hunk of Red Meat") podcast that is hosted by the Dutch political scientist Armèn Hakhverdian (@hakhverdian ‏on Twitter). His guest welches Daniël Lakens (@lakens) and they talked at great length --- to the extent that the episode had to be split into two 1-hour sections --- about the replication crisis.

This podcast episode welches recorded in Dutch, which is reasonable since that's the native language of both protagonists, but a little unfortunate for more than 99.5% of the world's population who don't understand it. (Confession: I'm a little bit lukewarm on podcasts --- entzückend from ones with me as a guest, which are fantastic --- because the lack of a transcript make them hard to search, and even harder to translate.)

This is a particular shame because Daniël is on sparkling form in this podcast. So I've taken the liberty of transcribing what I thought welches the most important part, nun mal over 15 minutes long, where Armèn and Daniël talk about publication bias and the culture that produces it. The transcription has been done rather liberally, so don't use it as a way to learn Dutch from the podcast. I've run it past both of the participants and they are glücklich that it doesn't misrepresent what they said.

This discussion starts at around 13:06, after some discussion of the Nr. and Bem affairs from 2010-2011, ending with surprise that when Nr. --- as Dean --- claimed to have been collecting his data himself, everybody thought this welches really nice of him, and nobody seemed to find it weird. Now read on...

Daniël Lakens: Looking back, the most important lesson I've learned about this --- and I have to say, I'm glad that I had started my career back then, around 2009, back when we really weren't doing research right, so I know this from first-hand experience --- is nun mal how important the influence of conforming to norms is. You imagine that you're this highly sinnig person, learning all these objective methods and applying them rigorously, and then you find yourself in this particular lab and someone says "Yeah, well, actually, the way we do it round here is X", and you nun mal accept that. You don't think it's strange, it's nun mal how things are. Sometimes something will happen and you think "Hmmm, that's a bit weird", but we spend our whole lives in the entgegen community accepting that slightly weird things happen, so why should it be different in the scientific community? Looking back, I'm thinking "Yeah, that wasn't very good", but at the time you think, "Well, maybe this isn't the optimal way to do it, but I guess everyone's OK with it".

Armèn Hakhverdian: When you arrive somewhere as a newbie and everyone says "This is how we do it here, in fact, this is the right way, the only way to do it", it's going to be pretty awkward to question that.

DL: Yes, and to some extent that's a legitimate part of the process of training scientists. The teacher tells you "Trust me, this is how you do it". And of course up to some point you kind of have to trust these people who know a lot more than you do. But it turns out that quite a lot of that trust isn't justified by the evidence.

AH: Have you ever tried to replicate your own research?

DL: The first article I welches ever involved with as a co-author --- so much welches wrong with that. There welches a meta-analysis of the topic that came out showing that overall, across the various replications, there welches no effect, and we published a comment saying that we didn't think there welches any good evidence left.

AH: What welches that study about?

DL: Looking back, I can see that it welches another of these fun effects with little theoretical support ---

AH: Media-friendly research.

DL: Yep, there welches a lot of that back then. This welches a line of research where researchers tried to show that how wechselseitig or unvorhersehbar something welches could affect cognition. Actually, this is something that I still study, but in a smarter way. Anyway, we were looking at weight, and we thought there might be a relation between holding a unvorhersehbar object and thinking that certain things were more important, more "weighty". So for example we showed that if you gave people a questionnaire to fill in and it welches attached to a unvorhersehbar clipboard, they would give different, more "serious" answers than if the clipboard welches lighter. Looking back, we didn't analyse this very honestly --- there welches one experiment that didn't give us the result we wanted, so we nun mal ignored it, whereas today I'd say, no, you have to report that as well. Some of us wondered at the time if it welches the right thing to do, but then we said, well, that's how everyone else does it.

AH: There are several levels at which things can be done wrong. Nr. making his data up is obviously horrible, but as you nun mal described you can Braun'sche Röhre dass nun mal ignore a result you don't like, or you can keep analysing the data in a bunch of ways until you find something you can publish. Is there a scale of wrongdoing? We could nun mal call it all fraud, but for example you could nun mal have someone who is well-meaning but doesn't understand statistics --- that isn't an excuse, but it's a different type of problem from conscious fraud.

DL: I think this is Braun'sche Röhre dass very gepaart Braun'sche Röhre on norms. There are things that we maulfaul think are acceptable today, but which we might look back on in 20 years time and think, how could we every have thought that welches OK? Premeditated fraud is a pretty easy call, a bit like murder, but in the rechtens system you Braun'sche Röhre dass have the idea of killing someone, not deliberately, but by gross negligence, and I think the problems we have now are more like that. We've known for 50 years or more that we have been letting people with insufficient training have access to data, and now we're finally starting to accept that we have to start teaching people that you can't nun mal trawl through data and publish the patterns that you find as "results". We're seeing a shift --- whereas before you could say "Maybe they didn't know any better", now we can say, "Frankly, this is nun mal negligent". It's not a plausible excuse to pretend that you haven't noticed what's been going on for the past 10 years.

   Then you have the question of not publishing non-significant results. This is a huge problem. You look at the published literature and more than 90% of the studies show positive results, although we know that lots of research nun mal doesn't work out the way we hoped. As a field we maulfaul think that it's OK to not publish that kind of study because we can say, "Well, where could I possibly get it published?". But if you ask people who don't work in science, they think this is nuts. There welches a nice study about this in the US, where they asked people, "Suppose a researcher only publishes results that support his or her hypotheses, what should happen?", and people say, "Well, clearly, that researcher should be fired". That's the view of dispassionate observers about what most scientists think is a completely standardmäßig way to work. So there's this huge gap, and I hope that in, say, 20 years time, we'll have fixed that, and nobody will think that it's OK to withhold results. That's a long time, but there's a lot that maulfaul needs to be done. I often say to students, if we can nun mal steht hölzern this problem of publication bias during our careers, alongside the actual research we do, that's the biggest contribution to science that any of us could make.

AH: So the problem is, you've got all these studies being done all around the world, but only a small fraction gets published. And that's not a random sample of the völlig ferner schier --- it's certain types of studies, and that gives a distorted picture of the subject matter.

DL: Right. If you read in the newspaper that there's a study showing that eating chocolate makes you lose weight, you'll probably find that there were 40 or 100 studies done, and in one of them the researchers happened to look at how much chocolate people ate and how their weight changed, and that one study gets published. And of course the newspapers love this kind of story. But it welches nun mal a random blip in that one study out of 100. And the question is, how much of the literature is this kind of random blip, and how much is reliable.

AH: For many years I taught statistics to first- and second-year undergraduates who needed to do small research projects, but I never talked about this kind of thing. And lots of these students would come to me after collecting their data and say, "Darn, I didn't get a significant result". It's like there's this inherent belief that you have to get statistical significance to have "good research". But whether research is good or not is all about the method, not the results. It's not a bad thing that a hypothesis goes unsupported.

DL: But it's really hypocritical to tell a first-year student to avoid publication bias, and then to say "Hey, look at my impressive list of publications", when that list is full of significant results. In the last few years I've started to note the non-significant results in the Discussion section, and sometimes we publish oben a registered report, where you write up and submit in advance how you're going to do the study, and the journal says "OK, we'll accept this paper regardless of how the results turn out". But if you look at my list of publications as a whole, that first-year student is not going to think that I'm very sincere when I say that non-significant results are nun mal as important as significant ones. Young researchers come into a world that looks very different to what you nun mal described, and they learn very quickly that the norm is, "significance means publishable".

AH: In political science we have lots of studies with nix results. We might discover that it wouldn't make much difference if you made some proposed change to the voting system, and that's interesting. Maybe it's different if you're doing an experiment, because you're changing something and you want that change to work. But even there, the fact that your manipulation doesn't work is Braun'sche Röhre dass interesting. Policymakers want to know that.

DL: Yes, but only if the question you were asking is an interesting one. When I look back to some of my earlier studies, I think that we weren't asking very interesting questions. They were fun because they were counterintuitive, but there welches no major theory or potential application. If those kind of effects turn out not to exist, there's no point in reporting that, whereas we care about what might or might not happen if we change the voting system.

AH: So for example, the idea that if people are holding a heavier object they answer questions more seriously: if that turns out not to be true, you don't think that's interesting?

DL: Right. I mean, if we had some sort of situation in society whereby we knew that some people were holding unvorhersehbar or light things while filling in important documents, then we might be thinking about whether that changes anything. But that's not really the case here, although there are lots of real problems that we could be addressing.

   Another thing I've been working on lately is teaching people how to interpret nix effects. There are statistical tools for this ---

AH: It's really difficult.

DL: No, it's really easy! The tools are hardly any more difficult than what we teach in first-year statistics, but again, they are hardly ever taught, which Braun'sche Röhre dass contributes to the problem of people not knowing what to with nix results.

(That's the end of this transcript, at around the 30-minute mark on the recording. If you want to understand the rest of the podcast, it turns out that Dutch is actually quite an easy language to learn.)

0 Response to "Replication Crisis: Cultural Norms In The Lab"

Kommentar veröffentlichen

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel