T. WitkowskiShaping Psychologyhttps://doi.org/10.1007/978-3-030-50003-0_13

13. Brian A. Nosek: Open Science and Reproducibility Projects

Tomasz Witkowski¹

(1)

Wroclaw, Poland

Tomasz Witkowski

../images/493508_1_En_13_Chapter/493508_1_En_13_Fig1_HTML.jpg

Like some of my other interlocutors, Brian Nosek became a psychologist by accident. When he entered the psychology field, he was a computer engineering undergraduate. Toward the end of his third year he enrolled in psychology classes as a break from the really hard courses. He soon realized that he was actually spending all of his time thinking about and working on his psychology courses. He found it much more interesting to do science on humans than to do research on circuits. When he came to the conclusion that he could really make a career of doing science on humans, he was hooked. The following year he switched his major to psychology but finished his degree in computer science as well.

When pursuing his graduate degree in experimental psychology, he started working on the Implicit Association Test, which reveals people’s implicit prejudices with the push of a button. Tap right every time a male name appears on a screen, for example, and left for a female name. This seemingly easy task gives interesting results when you add to the list of names some stereotypically related roles. Even the most liberal minds will sometimes stall when asked to press the same button for the word “executive” and for the name “Susan.” The tests are challenging, informative and kind of fun. So, in 1998, Nosek convinced his mentors, who had developed the test, to put it online. He called it “Project Implicit.” It was a success: About a million people per year now take the test for research, corporate training and other applications. It has done much to spread knowledge about the nature of implicit bias.

Although Nosek became a psychologist by accident, his approach to his work is certainly not accidental. As he says himself, his current activity is the culmination of his lifelong experiences. Both his mother and father, in different ways, lead their lives according to the values which are important to them. His father is a manager for whom ethics and honesty have always formed the basis of people management. His mother worked for the church, providing religious education, and was always focused on that which would improve her work.

Apart from his devotion to the values he took from home, another factor which has helped him in his projects has been his “ignorance.” As a graduate he was totally unaware of just how far scholars are resistant to innovation. In his opinion, despite the fact that there is still much said about innovation, especially in methodology, there is a strong tendency to accept a system which functions. Nosek considers that if, as a graduate, he had been aware of how strongly the status quo was accepted he would never have undertaken any of his projects, either those which he completed successfully or those in which he is currently engaged. Many times, when his applications for research grants were being reviewed, he was called “Pollyanna” because of his optimistic approach to innovative research methods and tools. However, he also remembers lessons from his mentors, who asserted that not attempting to do something was a guarantee of failure.

His ethical attitude learnt at home, incorrigible and unadulterated optimism through bitter experience, as well as his experience gained through cooperation with many scientific institutions throughout the world during the coordination of Project Implicit in 2011 enabled him and his collaborators set up the Reproducibility Project, with the aim of trying to replicate the results of 100 psychological experiments published in respected journals in 2008. During the Reproducibility Project, in 2013, Nosek took leave from his post at the University of Virginia in Charlottesville to co-found and direct the Center for Open Science (COS), a non-profit company that builds tools to facilitate better research methodology. COS continued coordinating the Reproducibility Project, whose results were published in 2015 in Science, and found that only 36 of the 100 replications showed statistically significant results, compared with 97 of the 100 original experiments.

COS has built a team that today numbers around 50 people from across disciplines, such as astronomers, biologists, chemists, computer scientists, education researchers, engineers, neuroscientists and psychologists. COS operates the Open Science Framework (OSF), a special online service where scientists can preregister their research, document their research process and share their data, materials and project outcomes. While OSF initially focused on psychology, it has since broadened to encompass any research field. COS simultaneously runs a number of research projects, including Many Labs I, II, and III, Reproducibility Project: Cancer Biology and many more.

Nor has there been a lack of inspiration and motivation drawn from Nosek’s private life, and one experience had a particularly strong influence on his engagement in the open science movement. In the spring of 2011 Sarah Mackenzie, who had been a witness at Nosek’s wedding, was diagnosed with a rare form of cervical cancer. Sarah and her family were strongly motivated to discover as much as possible about the disease and assure her of the best of care. They were not scientists, but they started to go through the literature in search of suitable articles. One evening a very angry Sarah called to Nosek, complaining that every time she found an article which might be of significance in understanding her disease, she came up against a demand for the payment of 15–40 dollars for access to the article. Research which had been financed by public money was inaccessible to her unless she paid. Nosek gave Sarah access to his subscription account so that she could comfortably peruse the scientific literature, but he woke up to the fact that the majority of people in Sarah’s position did not have the luxury of having friends working in wealthy academic institutions which were in a position to pay subscriptions to most of the scientific periodicals. Such a situation demanded change.

In 2015, Nosek co-authored a set of guidelines for transparency and openness that more than 500 journals have signed up to (Nosek et al. 2015). That same year he was named one of Nature’s 10 and was put on the Chronicle for Higher Education Influence list. As Nature wrote about him (365 days 2015), he is a “bias blaster” who has “pledged to improve reproducibility in science.”

Prof. Nosek, you’re one of the world’s most well-known advocates of the open science movement, engaged in improving transparency in science. What led you go beyond your research on implicit cognition and engage to such a significant degree in improving research methods and collaboration between scientists?

My interest in these issues started in graduate school, when I took Alan Kazdin’s research methods course. This was around 1997, and he had us reading papers from the 1960s and 1970s by Bob Rosenthal, Tony Greenwald, Jacob Cohen and Paul Meehl, where they articulate challenges like publication bias, lack of replication or low power, and they outlined solutions. For example, let’s try and do more replications, let’s increase the power of studies. Even preregistration was mentioned in some of these articles. It was shocking as a grad student in the late 90s to think that methodology has been outlining these problems and solutions for 30 years, but nothing has changed. Why is that? So, like many graduate student cohorts, we would have discussions after lectures or meetings at the bar and talked about how we would change the system if we could. And, of course, we didn’t have the opportunity to make any substantial changes at the time, but the interest in these issues remained at the core of how I thought about the research we did in our laboratory.

So we started with trying to address the power of our designs. We created a website for collecting data about implicit biases, which became very popular. We achieved very high power tests of the kinds of questions we were investigating. When I became a faculty member, we made it a routine practice to share as much of our materials and data as we could on our personal websites. And when services came up, we tried to adopt things that would improve our sharing. In the mid-2000s I started to write grants to the National Institutes of Health and National Science Foundation to create what we called at that time an open source science framework. We had a technical lab for a long time, we operated this website for collecting data, and we thought it would be useful to have a service that would make it easier to share that data and for others to use it as well. But we couldn’t get it funded then because there was just a wide range of reviews. Some said this was very important, a necessary change, and others said that people don’t like sharing data. It just wasn’t the right time. But there was a general interest that we had as a laboratory in improving the process of our own work, building tools for the technical portion of what we do, to make it easier for others to do it.

Then, in 2011, a lot of these methodology issues became of broad interest to the research community because of the Diederik Stapel fraud and because of very surprising results being published in leading journals, like Daryl Bem’s ESP work. And then paper in Psychological Science, “False-Positive Psychology” (Simmons et al. 2011), really crystallized for many people how we have some practices whose implications for the reliability of our results we don’t really understand. They provided a rhetorical tour de force of how that happened, which helped people to understand those behaviors and their consequences.

So all of that is happening around the same time. One of our failures is with replication. Initial studies became public, and people failed to replicate them. As a laboratory, we had been thinking and talking about a lot of these issues for a long time, but we didn’t have anything concrete, no database of evidence of replicability of findings. We had individual studies that we failed to replicate, but that happens all the time, and you can’t really tell if there’s anything systematic. We decided to think about how to start a collaborative project, to see if we could replicate some meaningful sample studies in the field. So we have this replication project that turned into the Reproducibility Project: Psychology. We’ve just started sharing it with others to gauge interested. Lots of people got interested very quickly, and that became a big project.

Here’s the background on starting the COS: Jeff Spies was a grad student in the laboratory with a history as a software developer, and we were looking for ideas for his dissertation. We kept coming back to this old project idea of an open source science framework. And while it was unusual, he decided it was very much in line with his long-standing interests, so he started working on that as a dissertation project. So we had this replication project and this technical project that received some broad attention, which we conceived of as lab projects. We were self-funded, without grants or anything, but then they came to the public’s attention, and some funders started calling. Very rapidly we moved from interesting but small projects to thinking about doing things in a big way. Within a matter of two months we went from a small lab to launching the COS as a non-profit.

Your activity has brought you global recognition, but also a lot of enemies. I am thinking particularly of speeches like Norbert Schwarz, who said that replication is important, but it’s often just an attack, a vigilante exercise. Still others have described you as replication bullies, false-positive police and data detectives. Susan Fiske called people who publicly discuss the results of scientific research and the methodology, methodological theorists. Did you expect such harsh attacks, and how do you deal with them?

I think I both expected and did not expect that kind of criticism. Of course, there will be criticism when moving into this kind of work. And that isn’t a bad thing. A lot of what this movement is about is challenging some of the fundamentals of how we do science and training a skeptical eye on the credibility of our existing evidence space. If that didn’t receive a lot of pushback, it would be very surprising. That would mean that we are all putting on a very cynical facade and don’t actually believe in the research that we do. Of course, there is going to be vigorous debate and pushback over that. I did hope that it would remain more scholarly and collegial than it has been at times, but this is getting a lot to people’s core identities. So the fact is that people respond very strongly, emotionally, with a real sense that this is an attack not just on the science, but on me and my colleagues. It’s not unreasonable that people have those reactions.

It’s important to keep the message very clear regarding what this is and isn’t about—the replication movement that we have tried to be involved in, or about our core principle of always remaining self-skeptical. I think that is a critical value of science, that we should be the most skeptical of our own results. The public at large depends on us to be critical of our own work, so that whatever comes out at the end is something the community can rely on without enjoying the expertise that we have at the beginning. It isn’t about trying to say someone is a bad researcher or doesn’t deserve to be a part of the scientific community. Those are very personal attacks. And a lot of that does occur, because people, whatever their point of view, sometimes can’t help themselves. Sometimes they do dislike a person or an idea, but we try to be a moderating voice in that aspect, always with respect to the individuals and gratitude for positive engagement. It’s about rescuing the science and always trying to be ourselves, living up to the values that we espouse: transparency, reproducibility, openness to criticism. The more we live those values and put them on display, I think that has, to some degree, helped to moderate some, although not all, of that criticism.

Open science can be seen as a continuation of, rather than a revolution in, practices begun in the seventeenth century with the advent of the academic journal, when the societal demand for access to scientific knowledge reached a point at which it became necessary for groups of scientists to share resources. The well-known sociologist of science Robert Merton mentioned communitarianism as one of the foundational values of the scientific ethos. Where did such ambivalent attitudes toward the open science movement come from among those scientists who are the heirs to these traditions and values?

That’s a great question, because at the core of what we’re pursuing as a culture change enterprise is shifting incentives, norms, and policies for science to be closer to those Mertonian norms. We can easily recognize them as the core values of how science is supposed to operate. How we got away from that is a complex question; even Merton himself didn’t think that individual scientists necessarily operate by those norms. His focus was rather on how social structures, the system, needed to live up to those norms, even if individuals themselves were more focused on their own interests, secretive of how they were doing their work, that sort of thing. But the system needed to operate with that self-skepticism as a system.

That said, we can do better, we can live up to those norms even in our own individual behaviors. And I think that one of the things preventing us from living up to that is at a very base level, which is technical. It has never been as easy to share methodology and data openly as it is now. The internet has changed what it means to be open and sharing. My colleague Tony Greenwald was editor of the Journal of Personality and Social Psychology in the mid-70s, and throughout his editorship he required authors to physically send their data to him for analysis. That kind of expectation in the seventies wasn’t very feasible. Now, it is trivial. So that is a change bringing us closer to the Mertonian norms.

A second factor, in my view, is science as a competitive system. It’s hard not to see the misalignment between what is good for me as a practicing researcher and what is good for science. The policies, the norms, and the incentives that we create for how science is now rewarded, and how those structures work, have not been designed to facilitate the norms as it could. That is a key goal for us as an organization: altering policies, norms, and incentives so that researchers can live by their values while reinforcing scientific values at the same time. They can succeed and do science in a wise way rather than having those being in conflict.

The profession of scientist is characterized by almost unlimited freedom. They choose their own subject and methods of research, and take the majority of decisions. With today’s extensive specialization, few scientists are able to review the correctness of others’ work. Isn’t this resistance to transparency an attempt to defend scientists’ freedom and, in particular, freedom from control?

That’s interesting. There certainly are elements of the conception of the freedom of how we do things as relates to scientific advancement. I embrace the idea that science is an open system, and people can make their own decisions about what problems are worthy of study, how they should study them, and what they should critique. Where I don’t share that feeling is the freedom to not be transparent, the belief I don’t have to tell you how I got my results. Of course, you have that freedom in some general way, but I would say that you’re not participating in a scientific system if you’re doing that. Transparency is so fundamental to how science accumulates credible knowledge that if I say “this is my result and you just have to trust me,” then I am excluding myself from an engagement with the scientific community and scientific practice as it needs to operate. That’s where we are wrestling with evolving our individual and collective understanding of what it means to have the freedom to study the things we want, but still retain some scope of how the system has to operate to actually work.

Do you perceive any patterns or rules in terms of who supports the open science movement and who opposes it?

My perceptions are mostly speculative based on observed experience. Probably the strongest predictor that I have observed is seniority. When these issues come up, people who are at the beginning of their career seem to say, “I thought that’s what just science is. Open science is just science, how would we do it differently?” It is very easy for researchers early in their career to get on board with the concepts of open science, because they are fundamental to how science needs to be. Of course, once we are in the system of doing science, we realize it doesn’t operate that way, so more senior researchers will not necessarily disagree with the core values, but more there will be various levels of skepticism about the pragmatics. It is common for senior researchers to say, “Can we actually do it that way? Here are all the reasons that I can’t share my data. Here are all the reasons it doesn’t make sense to preregister these kinds of designs. Here are all the administrative burdens that will actually slow down the pace of science.” There are a variety of real practical issues it would be very counterproductive to ignore. Senior researchers voice a practical skepticism that says, “Sure, I can believe in your idealism but let’s think about reality.” That kind of skepticism, I think, increases with seniority because of experience with the challenges that are created by the current system.

There is a second type of skepticism, which I think is also correlated with seniority. It goes, “I’ve been doing it this way for a long time, and it sounds like you’re telling me I’ve been doing it wrong. I don’t like the sound of that. I haven’t been doing it wrong. I’ve been doing it just fine.” Agreeing that I need to change my practices now would be some kind of admission that I don’t feel like I need to make. A researcher who has never published an article will not feel that resistance, but a researcher with 500 articles will, of course, feel it. Those are the big challenges. Another one, I think, is located in subcommunities. Different institutions have different subcultures where these ideas were embraced very rapidly, like the University of Oregon. The psychology department there rallies around open science, whereas other departments may have more resistant subcultures, regardless of seniority.

For several years now, everyone has been talking about a crisis in psychology and in some other sciences, although there are also those who actively deny its existence. But you mentioned Jacob Cohen (1962), who has been writing and speaking about the problems resulting from null hypothesis significance testing since the beginning of the 1960s. During the same time, Leroy Wolins (1962) initiated a discussion about access to raw data. Since the great affair of Sir Cyril Burt, frauds by psychologists fabricating their data have been regularly revealed every few years. The problem of the file drawer effect has been discussed since at least the 1970s. John Hunter published his article “The Desperate Need for Replication” in 2000. Other problems have been under discussion for decades now. Why is it that only in the middle of the second decade of the twenty-first century have we openly started discussing the crisis in psychology, and in such a way that outside observers could conclude that its discovery came as a great surprise to scholars?

That is a great question and I would love a good historical analysis to determine which of many possible reasons is the right one. In one sense, we’ve known about so many of these issues for a long time. The kindling for the reform movement has been there, and it has been accumulating for many years, so it had to happen sometime. Let’s say 2011 is the starting point. It could be an accident of history that this was the year it finally started. Another possibility is that all that was accumulating, and some stimulating, singular events occurred that made it a lot easier to confront this at the scale we’re doing now. The Stapel case, the Bem’s paper, the Simmons, Nelson, Simonsohn “False-Positive Psychology” paper. These individual cases captured a great deal of attention beyond just people who care about methodology, where this conversation has been ongoing since the 1960s. Those provided some stimulus to really ask “what are we going to do about this?” or “what is this, really?” The fact that they happened close together in time also may be a factor. Instead of just one event, they all started to pile on, and it’s like a dam broke.

Another factor is that this issue is salient not only in psychology. It has been accumulating attention across other fields. Maybe, with the internet and social media, rather than just the thought that in my little field we have this problem, it is a lot easier to see outside our disciplinary silos and say, “my goodness, biomedicine, they’re having this problem too. Oh, economists, they’re having this problem. Hang on a second. Who isn’t having this problem?” Connecting those communities of people who care about these issues across disciplines may have facilitated collective action that made it a lot easier for the movement to become better, faster, and more impactful. I suspect that it’s a complicated mix of many causes, but it’s fascinating that even though those things have been known for a long time, these events congealed into a real movement this time.

Observing the behavior of scientists, I have also noticed that some of them, despite all the problems we are talking about, are full of optimism and announce to the world that we are in an age of great replication which will solve all the problems, and we are entering the straight and broad path to the truth. Meanwhile, despite some grounds for optimism, the reality looks a little less optimistic. In the PsycARTICLES database I checked exclusively peer-reviewed articles. In 2017, 5166 of them were published, of which only 14 contained the word “replication” in the title. In 2018, their total number was 5530, of which 12 were replications; in 2019, 6444 were published and 97 contained the word “replication” in the title. In total, articles on replication accounted for around 0.7% of all publications. Are you yourself one of those optimists who believe we are at the threshold of a better tomorrow in our field?

Yes. I am optimistic. And the reason I am is that a sustainable culture change is growing. But this doesn’t mean it’s happening fast. There are a lot of people in this movement that are very discouraged. They think, “we’ve worked this out already, why are people still doing it wrong? Let’s just get it done.” But culture change is hard; changing people’s behavior is hard. We have great literature that shows how hard it is to make these changes. Let’s pay attention to the parts of our literature that we can trust and apply it as effectively as we can to the movement as it is occurring. There is good reason to be optimistic, because the core challenges are becoming very well known as problems, and it’s hard to be in our field and not be aware of these challenges. That’s a big first step. Whether people will change their behavior or not remains to be seen.

The second reason for optimism is that training is changing. Methodologists care about these things fundamentally and many are changing and updating statistics and methods courses. That matters, because training will stick with the generations as they come through this gradually.

The third reason for optimism is that the stakeholders, founders, publishers, societies, and institutions are paying attention. Maybe not quickly, but they’re all changing their policies, their norms, and their incentives. Policy change is the best way to have sustainable change in the long term. The general shift from not requiring any transparency to encouraging or requiring openness fundamentally changes what comes through a journal. Incentives for preregistration and badges that make visible that other people are doing these behaviors, can have long-term accumulated consequences of forming new norms. Finally, Registered Reports are now offered by more than 200 journals. Registered Reports eliminates publication bias by conducting review and committing to publish results before the outcomes are observed. That is a fundamental change to the publication workflow and will have a lasting impact if it achieves broad adoption.

All of the critiques of the movement saying that it’s not changing very fast are correct, but I don’t know if it can change faster than it is. The groundwork is being laid. The changes we’ve seen are not superficial. It’s not just one researcher and one team did something, and tried to replicate some findings, and that’s the end of it. Changes are happening in the structure of how we do our work, and that’s really what will help sustain it the long term.

The seriousness of the crisis in psychology is often diminished by describing it as replication crisis. We know, however, that it consists of many more problems than just the lack of replication. Which of them can be solved by the open science movement, and which should be overcome some other way?

Replication is the low-hanging fruit of improving research practices—here is a finding, here is a methodology, let’s try to see how reliable that methodology is. From this point of view, the replication part of the movement helps stimulate attention to the broader issues of how we can make research progress as quickly as possible. Replication doesn’t solve problems like construct validity, attentiveness to how our theories get refined and formalized so that they are testable, or connecting our theories with our operationalizations and inferential tests. There are big challenges in how we reason, how we accumulate evidence, and how we combine that evidence into theories in science. Open science doesn’t solve these, but open science is an enabler of pursuing solutions to these challenges.

We have a lot to mature in how we develop theories and how we make connections. But we can’t that work without open evidence, without transparency on how we’re getting our claims, without better sharing, without materials being open and accessible, and without some replication to test our theoretical positions in a more formalized way. To me, the movement, at least in psychology, is finishing phase one, where the major theme was replication. I think it’s entering or midway into phase two, which is about generalizability. It’s not just about replicating a finding in that context, but about the breadth of where we can see that finding across different operationalization, across different samples, across different conditions of the experimental context. Next, I think, it’s going to be entering a phase focused on measurement and theory. Now we see what is reproducible on our base claims, how we can better connect this evidence to more theoretical claims?

I see one more problem plaguing contemporary psychology, and which probably cannot be solved by the open science movement, or perhaps even deepened. I mean the shift from direct observation of behavior, widely regarded as an advance in the development of scientific methodology, to introspection. This was demonstrated in an outstanding 2006 article by Baumeister et al., and recently confirmed by Doliński (2018), who replicated Baumeister’s investigations. Both articles show that over the last few decades, studies of behavior have become a rarity among psychologists. This issue is brought into sharper relief by the fact that the first of the two articles was published in the middle of the decade that the APA announced with great pomp as the decade of behavior. What are your thoughts on this issue?

This is interesting, and I agree that it can’t be different than how we think about what the open science movement is trying to solve. We have to consider the incentives that shape individual researchers’ behaviors. If the focus of my attention in research is to publish as frequently as I can, in the most prestigious outlets that I can, then I have to make practical decisions about the kinds of questions that I investigate. Measuring real behavior, that’s hard. Sending out a survey, that’s easy. I can generate more publishable units doing that.

Or using Mechanical Turk.

Exactly. Researchers are already stressed about publication demands, and now open science movement is increasing the pressure for higher power. This makes it even harder to do those behavior studies—now, instead of 50 people you need 500. This is a real issue, elements of the open science movement—increasing power, increasing transparency, increasing rigor—could be at odds with the goal of getting the research community less focused on self-report surveys and more diverse in how it measures human behavior. If we don’t solve the incentives problem, then we will become a very narrow discipline. In part, that may reflect how hard is to study the things that we study. We have to recognize where we are as a discipline in terms of our tools and instrumentation, to be able to do some kinds of science. Then, we have to be realistic about what science can be done effectively with the available resources.

In some ways, we have tried to study questions that we just don’t have the technology, the ability, or the power to study effectively. If we take seriously what questions can we productively investigate with the resources we have available, then we may recognize that a lot of the questions we want to study, cannot be studied effectively the way that we do science now. If those questions are really important, then we need to change how we do things. This is where another element of the open science movement has part of the solution–collaboration.

Right now, we are rewarded for being a vertically integrated system. Myself or my small team comes up with an idea, designs a study, collects the data, writes the report, all on our own. This requires lots of resources devoted to small groups. If we move to more horizontal distribution, where many different people can contribute to a project we can study many questions that we are presently not able to study well. There might be one team designing the study, and 15 teams that contribute data. With that kind of model, if we get the reward system worked out we could study some questions more productively and with adequate power to actually make progress on the problem.

The reform movement needs to attend to the effects of each change on the complex system of incentives and reports: What happens with increasing expectations for power? What will that change in what people study? How do we solve that? These systems are complex and getting all the incentives aligned for maximizing progress is hard.

There is another problem that is troubling psychology, and which you very diplomatically described in one interview as “conceptual redundancy.” We know that it is basically about cluttering our field with needless, often duplicate theoretical constructs, about unnecessarily publishing and creating new concepts for previously known and described phenomena. This conceptual redundancy is increasing at a rather alarming rate. What is your opinion about it?

I think it’s another illustration of a different part of the reward system. In psychology we value the idea that each person has their theory, and their conceptual domain that they study that is linked to them and their identity. The consequence is that this incentivizes the splitting approach, where the same concept is being studied by five different groups, each giving it different labels. Early in research, this can be useful. When you’re in a generative phase, we have no idea what this problem space is like, and we need various approaches to explore that problem space. What if we don’t have is the consolidation phase, where we say, “OK, there are seven different groups that have what seems to be the same kind of idea.” How do we figure out what actually is the same and where the differences are? That lack of consolidation leads to a very fractured and not very accumulative discipline.

The social challenge we need to address is that we’re individually tied to the words we use to describe psychological phenomena—I don’t want my perspective on self-enhancement to be combined now with somebody else’s idea that is the same, but uses a different word to describe it. The methodological challenge is to create occasions for similar ideas to be confronted against one another. And there is a great example, in development now, that I think would be useful as a prototype, and that I am hoping will become ordinary practice. The Templeton World Charity Foundation organized group of neuroscientists who all have different theories of consciousness. The scientists are told “you’re going to sit together in a room for three days, and you’re going to come up with experiments where at least two theories have different expectations of the outcome.” And of course, it took them two days of yelling at each other to even figure out what the differences in their theoretical expectations were. But if we can stay scholarly, this process can be very productive to provide some clarity about the actual similarities and differences between these theoretical perspectives. They did come up with two experiments for which the theories make different predictions. And now they have funding to do the experiments. That is a great exercise, so I am hoping that some initial prototypes of that will be very widely disseminated, and generate lots of attention for how creating a confrontation and consolidation process to counterweight the spreading process as you introduced.

Daniel Kahneman proposed something that he called the adversarial method, but in a database of scientific articles I was able to locate just a few empirical works carried out in accordance with the recommendations of that method, and a few dozen publications involving discussion of adversarial collaboration.

Yes, it is hard to do. There are not many obvious rewards for entering into adversarial collaboration; people prefer to think “my findings will stay my findings, let’s just ignore each other, we’ll both be fine.” With confrontation comes risk, so there is a social barrier to adopting it. Even though people might recognize that the value to science is high.

In your opinion, is there any chance in the near future of a unifying theory in psychology that would reduce this redundancy a little bit?

I’m skeptical, at least for the near future, because the phenomena that we study are so diverse that there probably isn’t a unifying theory that would be tractable and useful. I do like the emergence of theoretical frameworks, like evolutionary psychology, for example, to organize a set of ideas about human behavior. But I suspect that we are going to have a few different theoretical frameworks for a while. I can’t predict what convergence is possible.

Critics of replication projects have sometimes stressed that a strong emphasis on replication may lead scientists to focus on replication research instead of exploring difficult and serious problems that can have a significant impact on our reality. It is difficult not to admit that they are right. How do you resolve this dilemma?

I think this is misspecified critique. The idea that we don’t replicate and we’ll make progress without doing replication, is, I think, why we facing the problem we have. We have failed to understand the value replication for theory and discovery. We generate lots of ideas, we have the feeling of making progress, and then we have little actual confidence or credibility in the underlying basis of the claims that we are making. The fact that there are hundreds and hundreds of studies about ego depletion, and we’re still debating whether ego depletion actually occurs, shows how problematic it is to not do things that increase the credibility of the core evidence.

The other thing that I think is misspecified about the argument is that replication is a mechanism for theoretical advancement. Replication is sometimes better for theoretical advancement than pure exploration. To do a replication, one must construct a situation in which you have a clear prediction of what should occur based on the prior findings and their theoretical context. Replication is the confrontation of an existing understanding of a phenomenon. The consequences of the outcomes of a replication are to affirm and create more generalizability for the existing understanding, or to confront that existing understanding with disconfirming evidence. The latter is inevitably theoretically generative because our expectations from prior findings were violated. The combination of replication and explication is one of the most theoretically generative things we can do. I don’t know what the right ratio is between them for maximizing progress, but I think it is incorrect to say that replication is itself different than making theoretical progress.

Is there any particular type of criticism aimed at the open science movement that you feel is particularly serious and justified?

I think it’s all serious and justified, unless it’s personal. The claims of the open science movement need to survive critique, just like all of the findings that are being critiqued in replications need to survive their critique. Let me give a couple of examples. There is a critique that preregistration will reduce creativity, make research more boring. I don’t have any evidence to show you that it will not do that. And we can generate plausible stories for how it could. For example, if I have to pre-commit, I might get more conservative, because pre-committing to crazy ideas might be embarrassing, particularly if they are wrong—which most will be—they are crazy ideas after all. But, some crazy ideas are totally transformative. Preregistration could induce risk aversion. If I don’t have to pre-commit and I can just do it, then maybe I’d be more willing to take risks because who cares if I am wrong.

Whenever we’re pursuing culture change there is potential for unintended consequences. In fact, unintended consequences are functionally inevitable, because we don’t know all the consequences of our actions. If the open science movement did not have a skeptical audience constantly evaluating what happens when we make these changes, we will end up doing some things that are counterproductive for research progress.

So, I am very glad that there is positive engagement of skepticism for these changes. What I don’t like is when it gets personal, like the “methodological terrorists” kind of remark. This is completely unproductive. It is also unproductive when the skepticism is so strong that people don’t even want to try. The whole purpose of research is to try something and see what happens, and if we are so ideological that we say, “I cannot share the data because it will screw things up,” so I even don’t try to share, or “pre-registration isn’t even worth trying, because I am sure that it will reduce creativity,” that isn’t really engaging in research. Competing perspectives are valuable as long as we are studying it, learning something, and then figuring out how to do it better.

How is it that in spite of information appearing about the crisis and the general decline in trust toward psychology, the field seems to be attracting ever greater numbers of people? As a major, it is shattering records among students in both Europe and America.

It’s attracting more involvement. My guess is that early students at the undergraduate level are, by and large, unaware of these issues. I think that the emergence of awareness—and this is speculation—comes later in the undergraduate stage at the earliest, which is perhaps after the commitment has already been made.

A second speculation is that a lot of people early in their career find the reform movement exciting rather than unnerving. I think many people are entering the discipline with the feeling that the field is emerging rather than that the field is in tatters, in crisis, or in decline. The reform movement itself is very healthy and expanding, so I think that is having salutary effects on engagement.

Some people claim that psychology attracts people who like fuzzy disciplines, where there are no precise claims and hard knowledge. There is a lot of room for interpretation and no responsibilities required. What’s your opinion?

I think it’s possible that there are people who like the generative and open-ended explanations for why things happen the way they do, and psychology allows some of that freedom. That may even impact some of the issues we have discussed the research process. Maybe it’s harder to become a physicist and think “let’s just explore the world as it is, I will generate my own theory of physics.” The frameworks for understanding in physics are stronger perhaps limiting entry of those who want to be very exploratory and generative. But of course, when you’re out on the edges of physics it is still very exploratory. All kinds of crazy ideas that come up in theoretical physics these days, it’s an amazing, generative, and creative field. There is probably a range of motivations and interests that get people into psychology. If there weren’t people who get more singularly focused and want to solve this or that particular problem, it would be problematic for psychology. Psychology has enough to study that an ecumenical approach to who gets involved in research and their approach to research is ok and healthy at the current stage.

Fortunately, apart from problems, our field also has a lot of achievements. Which of the existing psychological discoveries do you consider to be the most significant breakthrough?

I can’t say confidently, because I’d have to review all of the literature to say which of those things are the most important. But what comes to mind as you ask the question is the astonishing progress that has been made on understanding visual perception from what we understood about how the visual system works in 1900. This is a massive transformation. We’ve learned so much, and so much of that work has been applied to computer vision and other kinds of research applications, and to our practical understanding of the visual system in animals and in humans. I am a total outsider to it, but I love what I know of the work.

Kahneman and Tversky’s heuristics and biases is, I think, the most directly impactful on questions in the social cognitive domain of judgment. That’s an obvious one to mention, but the degree to which we now understand motivated reasoning in the big picture and particular biases that are current reasoning is incredibly important. What we need from that field is to grasp how we deal with those biases in everyday judgment and decision making. We need systems and solutions to address unwanted biases where they occur. It would be transformative for human behavior if we can solve these questions.

The last theory would be areas where we have effective treatments for some areas of mental health. The fact that we can address many types of phobias, in a single session, or in 8 hours with cognitive-behavioral treatments, is astonishing.

Each of these examples shows that basic psychological knowledge can be translated into an understanding of how we improve human behavior.

What do you consider to be the biggest challenges in the field of psychological science generally?

I think the biggest challenge that we’re spending our time worrying about is how to fix the reward system. This is not about research topics, but I think it has direct implications for research topics. And the part that is the hardest to change is hiring and promotion at academic institutions. If we don’t fix the need for more and more publications and more and more prestigious outlets as the criterion, none of the other open science changes will be completely effective. It is very challenging because institutions have their independent, ad hoc criteria. There is no singular policy or decision-making body to facilitate this change. The reason I focus on systemic issues like that is that if we fix these issues, all the other challenges in the field of what research gets done and how it gets done will be much easier to solve. The systemic challenges of how people are rewarded for the kind of work they do are barriers to the more specific challenges of how to do the best research.

What are your thoughts about artificial intelligence and about the future of artificial intelligence in psychological research?

There is a lot of potential in what artificial intelligence approaches can provide as a complement or as a replacement for some of the things that we’re doing in actual research process. It’s exciting to see the advances being made, including the identification of serious challenges, the black box problem being the most obvious one. If a machine solves the problem, but we have no idea how, then what have we learned, and how do we start to unpack the approaches to generalize or approach new problems and solutions? Like many tools, artificial intelligence is an exciting tool, and it’s still in speculation phase for what we will be able to get from it in advancing psychological knowledge. But, some effort testing the opportunities and limits is time well spent.

Which projects are absorbing you at the moment and what are your plans for the future?

The main research project that is dominating our attention is called the SCORE project. It’s a project funded by DARPA. We’re one of multiple teams involved in three technical areas: TA1, which is us, TA2 has two teams, and TA3 has three. The goal of the project is related to artificial intelligence, to see if we can create automatic indicators of the credibility of research claims. When you open a paper, each of the claims in the paper could have a score next to it: This one had 72, this one has 15, that one has 99. The machines would give an initial calibration of confidence that we can have in the credibility of the claim. This is a high aspiration, but pieces of evidence suggest that there is information we could extract from papers, and from the research at large, to help us assess the credibility in different findings as an initial heuristic. How much other evidence are these findings supporting? How does it fit with other claims? What fits that particular claim or that particular study? It’s an exciting problem to try to solve, and the actual work that we’re doing in our team is extracting claims from the literature.

We took 60 journals and extracted a sample of 50 articles from each year from 2009 to 2018 to create a database of 30,000 articles from the 60 journals. Then we take 10% of those as a stratified random sample, and we extract a claim from each article. We trace the claim from the abstract to a statistical inference in the paper supporting that claim. We then created a database of 3000 claims. TA2 teams evaluate the credibility of those claims with expert judgment and prediction markets giving each claim a score. TA3 teams are applying machines to try to assess the credibility of those claims. The machine gives them scores based on whatever information they can gather.

While all that is happening, we organize a thousand people in a massive collaboration to do replications of the substance of those claims as the ground truth. We shall see if the people in TA2 and the machines in TA3 can predict successful replication or not. This project will generate very useful data to study many questions in metascience and replicability. It’s an extremely generative project, while simultaneously having a clear structure. And, the problem we are studying is an exciting problem.

The other problem we’re really interested in studying is whether the various interventions we have introduced to improve the research process are working or not. We are running studies, for example, on Registered Reports to assess whether it’s actually meeting the promise that we theorize, and whether there are costs, like the problem of creativity or conservatism that might emerge. Those would be very useful data to really help refine and improve the reforms—let’s keep doing the things that are working and let’s change or stop the things that aren’t. This is the main areas of focus for me in the next few years.

It sounds promising. I know that most scientists avoid answering questions about the future. Nonetheless, I would like you to try and tell me how you imagine our discipline in 20–30 years’ time?

Those are impossible questions, but I can at least answer what I hope for the discipline. My hope is that openness will be the default. And what I mean by open is the process of doing research, how I arrived at my claims. Maybe my approach included preregistration, or was pure exploration, or was something else. The goal is that you will be able to see my process from where I started to how I got to my claims. You will have access to the materials, the data and the things supporting that claim, to the extent possible. And, it will be easy for you to discover the evidence associated with that claim. Discovery tools will make it easy to say “here is my study, here are the 15 other studies that were done on that same problem.”

If we have that, I think a lot of the other things that we’d like to see happen for the field will become more possible. It would be easier to solve the issue of publication bias, as we’ll have meta-analyses that are much more credible for the state of understanding in this particular domain. More openness will make it easier to repeat things, we’ll have a more routine ethic of replication, which will assess the credibility of particular claims and evidence. Having this will make it more likely that we’ll start to tackle to the really hard problems, and actually make progress on them. If, in 20–30 years, we can get to a place where we can actively see how theories are developing in a systematic way in psychology, that would be great.

I have asked all my interviewees for advice they could give to young psychologists who are just starting their careers. You are my youngest interlocutor, but at the same time you have acquired experience that is completely different from that of others and you look at our field from a very broad perspective. What advice would you give to such young people?

To me, the most important things are to identify problems that one cares about, because it does matter to care. Just studying something because it’s the thing I am studying, that won’t keep people up at night in ways that I think are productive for doing scholarly research. Having questions that one is passionate about is a big value. The second thing is really thinking about what it means to me as a researcher to do the best possible research I can, and to prioritize those behaviors that are the ways that my science can be done best. I think it’s very easy to fall into doing something because other people do it this way, other people say this is the priority. Of course, one needs to be attentive and responsive to the realities of the external rewards. But the satisfaction with the work that we do, I think, is largely internally driven. I won’t be happy doing my research if I can’t follow the values of how I think research should be done. That’s important to prioritize, because if I can’t live according to my values than I will just be constantly frustrated and feel like I am undermining my own goals.

Who would you point to as an example to follow?

Tony Greenwald. Tony has a long-standing and well-earned reputation of being hard to deal with, because he has a very clear idea of how he thinks science should be done. This makes him obstinate and impossible at times. I’ve published more papers with him than with anyone else. Each one of them has been torturous in its own way, but also extremely gratifying. The one thing that he does not sacrifice is wanting to be able to be confident in the claims he makes in a paper. So, he is constantly skeptical of his work, attacking it in different ways, so that we can believe in the output. I’ve tried through my career to emulate the things in his methodology, of doing the way it should be done. Tony is an emblematic example of that. And maybe I try to be a little bit nicer than Tony was about it. He isn’t perfect, but he is fantastic, and I am so grateful for having been trained by him.

While working on this book I asked my readers to submit one question they would like to ask an eminent living psychologist. I received 30 of them. Would you agree to draw and answer one?

Sure, give me number 15.

What psychological idea is ready for retirement?

I’d say the idea that psychology is so different from other scientific disciplines that we can’t use practices in other disciplines to inform how we can do ours better. I think this is partly a conceit, and partly low collective self-esteem that we think this way: “The reason that people see us differently, they don’t respect us, is because we are totally different.” But we aren’t totally different. We are studying hard problems, perhaps the hardest among the sciences. But to then become so defensive that we can’t look to other sciences to identify ideas and new practices to do ours better is a defensive stance that I think needs to die.

Selected Readings

Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S., & Ditto, P. H. (2011). Mapping the moral domain. Journal of Personality and Social Psychology, 101(2), 366–385.Crossref
Hawkins, C. B., & Nosek, B. A. (2012). Motivated independence? Implicit party identity predicts political judgments among self-proclaimed Independents. Personality and Social Psychology Bulletin, 38(11), 1437–1452.Crossref
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., et al. (2017). A manifesto for reproducible science. Nature Human Behavior, 1, 0021. https://doi.org/10.1038/s41562-016-0021.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Scientific standards: Promoting an open research culture. Science, 348(6242), 1422–1425.
Nosek, B. A., Ebersole, C. R., DeHaven, A., & Mellor, D. M. (2018). The preregistration revolution. Proceedings for the National Academy of Sciences, 115, 2600–2606. https://doi.org/10.1073/pnas.1708274114.Crossref
Nosek, B.A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615–631.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716.
Uhlmann, E. L., Ebersole, C., Chartier, C., Errington, T., Kidwell, M., Lai, C. K., et al. (2019). Scientific utopia III: Crowdsourcing science. Perspectives on Psychological Science, 14, 711–733.

References

365 days: Nature’s 10. Ten people who mattered this year. (2015, December 24). Nature, 528, 459–467.
Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2(4), 396–403.
Cohen, J. (1962). The statistical power of abnormal-social psychological research. Journal of Abnormal and Social Psychology, 65(3), 145–153.Crossref
Doliński, D. (2018). Is psychology still a science of behaviour? Social Psychological Bulletin, 13(2). Retrieved from https://doi.org/10.5964/spb.v13i2.25025.
Hunter, J. (2000). The desperate need for replication. Journal of Consumer Research, 28, 149–158.Crossref
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Journal of Social Archaeology, 22(11), 60–80.
Wolins, L. (1962). Responsibility for raw data. American Psychologist, 17(9), 657–658.Crossref