Confidence at Scale: Using Technology to Assess Research Credibility

December 22nd, 2022,
Illustration of magnifying glass hovering over research papers

After multiple years of data collection, the Research team at the Center for Open Science (COS) is preparing for the end of its participation in DARPA’s Systematizing Confidence in Open Research and Evidence (SCORE) program and the transition to the work that follows. SCORE has been a significant undertaking spanning multiple research teams and thousands of researchers and participants throughout the world – all collaborating to answer a single question: Can we create rapid, scalable, and valid methods for assessing confidence in research claims?

Our team has been at the core of the program from the start, with responsibility to provide two kinds of data that underpin SCORE’s success. First, we’ve provided manually extracted research claims and evidence from a set of 3,900 social-behavioral science articles, each with a detailed record of the research finding produced, the hypothesis tested, and the evidence underlying the finding. We initially generated this dataset with just COS personnel before expanding the effort to include collaborators from across the world. Our collaborators were critical to this effort, helping us both generate the initial claims data and then review them to ensure internal consistency and that they’re fair reflections of the underlying articles.

Our second responsibility in SCORE has been to provide credibility data for hundreds of the research papers in the form of replication attempts and reproduction tests, which have been performed in collaboration with a dispersed team of partner researchers. We’ve worked side-by-side with these partners, coordinating their efforts to collect and analyze data, peer review their preregistrations to ensure the evidence is high quality, and review the results to confirm that the replication and reproduction evidence itself is transparent and reproducible. Thanks in great part to the contributions of these dedicated collaborators, what’s emerged from SCORE is the first-of-its-kind in its breadth across fields, years, forms of research, and kinds of credibility indicators measured.

Karolina Urbanska, a social psychologist who has been one of our key partners on SCORE since 2020, recently offered us some of her thoughts about her work on the project and collaborating with COS.

Karo headshot blogHow did you get connected with the SCORE project?
Karolina Urbanska (KU): When pursuing my PhD, I developed an interest in open science practices and learning about how we can contribute to building transparency in research. COS has emerged during this time as the key organization promoting a better way to conduct research. One of the first initiatives I had heard about were Registered Report prizes, and eventually I ended up leading a paper that was published as a registered report a couple of years later.

What’s been your primary role with SCORE?
KU: I have been one of the key SCORE collaborators for the duration of the project, leading replications and reproductions using secondary datasets. This involved multiple tasks such as manually extracting key findings from the existing papers, reviewing replication projects proposed by other collaborators, finding and documenting relevant publicly available datasets that may be used for replication, and conducting and auditing statistical analyses.

How do you believe the objectives of SCORE contribute to Open Science more generally?
KU: The scale and rigor behind SCORE are unlike anything that was achieved so far in the open science research space. Each step of the project has been carefully planned to ensure that we can reach decisive conclusions regarding the current state of social and behavioral science research which makes me confident in SCORE's ability to transform our understanding of the state of the field.

Do you believe SCORE is relevant to people who are not scientific researchers? Why or why not?
KU: In recent years, there has been a lot of debate about the trustworthiness of science. We can be tempted to say that we absolutely trust science, but the process of science is a lot more reliable than the outcome of science. We need to have more nuanced discussions about what it means to trust science as opposed to seeing it as a product of infallible geniuses, and I think that findings from the SCORE project will represent a valuable contribution to this conversation.

Why is rigor and transparency in scientific research important to you on a professional level, a personal level, or both?
KU: Rigor and transparency bring us closer to the truth, and without those two principles, we cannot pursue scientific endeavors as a field, nor can we work toward new discoveries if we are working off of faulty assumptions. Scientific discoveries are difficult to achieve to begin with, but sloppy findings can muddle the waters even more, making it more difficult to find innovative solutions.

How has your experience been collaborating with COS?
KU: Collaborating with COS has been a highlight of my work in the last few years. I have been involved in a very wide range of tasks from the more mundane, such as calculating the range of values that may be considered as close enough for a reproduction (e.g., values within 15% of the original effect size reported), to those more exciting, such as leading a full replication project using a secondary dataset, responding to peer reviewers, and preparing analytic scripts. The COS team I have been collaborating with has always been a source of great support and eager to solve problems together, adding an additional layer of scrutiny to already high quality projects. Special shout out to Priya, Andrew, Zach, and Bri who made my experience of working on the SCORE project such a joy!

What’s next with SCORE?
As ambitious and extensive as the SCORE project has been, it’s really just the first step in creating scalable tools for assessing research credibility. More research, development, and testing is required before the technologies that emerged from SCORE can be used by all of the stakeholders we hope can benefit – including research communities, policymakers, practitioners, and the broader public. 

We’re excited that COS will be expanding on the efforts that started under SCORE. Over the next three years, we’ll be making progress on a number of fronts:

  • Expanding beyond the core social and behavioral science disciplines to include health research. This will help us understand how generalizable the emerging tools can be.
  • Increasing the number of algorithms providing credibility assessments through a public competition. The original SCORE program involved four algorithm teams that took different approaches and generated independent credibility assessments. The aim now is to strengthen the overall accuracy of predictions and decrease the risk of algorithmic bias by developing many more approaches than the original program generated.
  • Prototyping these new tools within the research community to learn how researchers perceive automated scoring of research credibility, as well as what could make it most useful to this community.
  • Conducting research and user-testing on the best ways to convey algorithm scores to encourage appropriate use and mitigate risks.

SCORE is one of several initiatives our team is undertaking to help make reproducibility in scientific research the norm. These efforts require significant investments of time and resources, which is why we ask for help from many stakeholders, including individuals like you who care about seeing open science succeed. Please consider supporting COS with a donation before the end of the year.

Recent Posts