The primary objective of SCORE was to create scalable and accurate indicators of repeatability that could be applied across large bodies of research. Central to this aim was the development of algorithmic tools capable of producing confidence scores for research claims, thereby allowing researchers, institutions, and other stakeholders to more efficiently identify claims that warrant further scrutiny.
This effort was supported by systematic examinations of three key dimensions of repeatability: reproducibility, or the ability to obtain the same results from the same data and analysis; robustness, or the extent to which results remain consistent across justified alternative analytical choices; and replicability, or the consistency of results when new data are collected to address the same research question.
With SCORE, we also sought to understand how these measures relate to one another, to expert and machine-generated predictions, and to other potentially relevant indicators such as disciplinary norms or journal policies. A further objective was to generate openly accessible datasets, algorithms, and replication and reanalysis materials, thus supporting continued innovation in credibility assessment across the scientific community.
To achieve its aims, SCORE employed a multi-method approach that combined claim extraction, expert and machine assessments, and large-scale empirical evaluations of repeatability. The program began by identifying thousands of research claims from published articles in the social and behavioral sciences and then generated expert judgments and machine learning predictions about the credibility of those claims. These predictions served as candidate indicators that could be validated through empirical testing. The validation efforts consisted of three major empirical studies: reproducibility, robustness, and replicability.
The reproducibility study examined whether original findings can be recreated using the same data and analyses. Drawing on a stratified random sample of 600 papers, reproducibility assessments were conducted for papers in which data were publicly available or successfully obtained from authors.
The robustness study investigated the degree to which research findings depend on analysts’ choices. Independent re-analysts each reanalyzed the same dataset for 100 selected claims, allowing the program to assess the extent of analytical variability and its implications for scientific conclusions.
The replicability study tested whether original positive findings generalize to new data, using high-powered replication attempts of 274 claims drawn from 164 papers. Across these studies, SCORE integrated the resulting evidence to evaluate how reproducibility, robustness, and replicability relate to each other and to predictive assessments by humans or machines.
Data, materials, and tools generated through this process are openly shared to support transparency, reuse, and further methodological development.

6218 Georgia Avenue NW, Suite #1, Unit 3189
Washington, DC 20011
Email: contact@cos.io

Unless otherwise noted, this site is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Responsible stewards of your support
COS has earned top recognition from Charity Navigator and Candid (formerly GuideStar) for our financial transparency and accountability to our mission. COS and the OSF were also awarded SOC2 accreditation in 2023 after an independent assessment of our security and procedures by the American Institute of CPAs (AICPA).
We invite all of our sponsors, partners, and members of the community to learn more about how our organization operates, our impact, our financial performance, and our nonprofit status.