Research evaluates scientific ideas.
We evaluate research.

We are always interested in how research is conducted so we can help make it better. What contributes to reproducibility, or failure to reproduce? What best practices can we develop through evaluation that might increase the efficiency of scientific research? Our goal is to investigate and reveal those insights. Below are projects we have been working on.

SMART: Scaling Machine Assessments of Research Trustworthiness

Through a grant from the Robert Wood Johnson Foundation, COS, in partnership with researchers at the University of Melbourne and Pennsylvania State University, have begun the SMART project, which seeks to advance the development of automated confidence evaluation of research claims. SMART will extend the research work initiated by the SCORE program through conducting user research and generating additional data to improve the algorithm and human assessment approaches developed during the program.

Meta Partners with COS to Share Data to Study Well-Being Topics

Using innovative methods from the open science movement to promote rigor and transparency of research, Meta and COS will pilot a new approach to industry-academia partnerships for accessing social media data.

Opening Collaboration for Large-Scale Study on Registered Revisions

COS is looking to partner with journals in a semi-centralized meta-RCT on Registered Revisions. Registered Revisions are an in-peer-review device that occurs when reviewers request additional data/analysis. Authors pre-register the methods that will be used to address these requests, and editors and reviewers make their acceptance decision on the basis of this protocol, regardless of the results.

We are conducting an experimental collaborative project in which COS provides a boilerplate study design for journal partners to carry out and publish their own experiments, generating many individual studies under a prospective, living meta-analysis. 

Reproducibility Project: Cancer Biology (RP:CB)

The RP:CB is an initiative to conduct direct replications of 50 high-impact cancer biology studies. The project anticipates learning more about predictors of reproducibility, common obstacles to conducting replications, and how the current scientific incentive structure affects research practices by estimating the rate of reproducibility in a sample of published cancer biology literature. The RP:CB is a collaborative effort between the Center for Open Science and network provider Science Exchange. Are you interested in becoming a panel member to review the reproducibility of these studies?

Research Quality of Registered Reports Compared to the Traditional Publishing Model

More than 350 researchers peer reviewed a pair of papers from 29 published Registered Reports and 57 non-RR comparison papers. RRs outperformed comparison papers on all 19 criteria (mean difference=.46) with effects ranging from little difference in novelty (0.13) and creativity (0.22) to substantial differences in rigor of methodology (0.99) and analysis (0.97) and overall paper quality (0.66). RRs could improve research quality while reducing publication bias and ultimately improve the credibility of the published literature.

Credibility of preprints: an interdisciplinary survey of researchers

Preprints increase accessibility and can speed scholarly communication if researchers view them as credible enough to read and use. Preprint services do not provide the heuristic cues of a journal's reputation, selection, and peer-review processes that, regardless of their flaws, are often used as a guide for deciding what to read. We conducted a survey of 3759 researchers across a wide range of disciplines to determine the importance of different cues for assessing the credibility of individual preprints and preprint services.

SCORE: Systematizing Confidence in Open Research and Evidence

There is still much to learn about reproducibility across business, economics, education, political science, psychology, sociology, and other areas of social-behavioral sciences. In order to better assess and predict replicability of social-behavioral science findings, the Center for Open Science, in partnership with Defense Advanced Research Projects Agency (DARPA), is working to help advance this understanding.

Opening Influenza Research

We invite the influenza research community to “empty the file drawers” and contribute to a thorough aggregation of open and accessible findings to close the gaps in our understanding of influenza.

We invite proposals from the influenza research community that fit the following submission types: 1) existing negative and null results, 2) existing replication studies, and 3) new, proposed, highly-powered replications of important results in influenza research.

Reproducibility Project: Psychology (RP:P)

The RP:P was a collaborative community effort to replicate published psychology experiments from three important journals. Replication teams follow a standard protocol to maximize consistency and quality across replications, and the accumulated data, materials and workflow are to be open for critical review on OSF. One hundred replications were completed.

Collaborative Replications and Education Project (CREP)

The Collaborative Replications and Education Project facilitates student research training through conducting replications. The community-led team composed a list of studies that could be replicated as part of research methods courses, independent studies, or bachelor theses. Replication teams are encouraged to submit their results to an information commons for aggregation for potential publication. This integrates learning and substantive contribution to research.

Crowdsourcing a Dataset

Crowdsourcing a dataset is a method of data analysis in which multiple independent analysts investigate the same research question on the same data set in whatever manner they consider to be best. This approach should be particularly useful for complex data sets in which a variety of analytic approaches could be used, and when dealing with controversial issues about which researchers and others have very different priors. This first crowdsourcing project establishes a protocol for independent simultaneous analysis of a single dataset by multiple teams, and resolution of the variation in analytic strategies and effect estimates among them. View the paper here.

Badges to Acknowledge Open Practices

Openness is a core value of scientific practice. The sharing of research materials and data facilitates critique, extension, and application within the scientific community, yet current norms provide few incentives for researchers to share evidence underlying scientific claims. We demonstrate that badges are effective incentives that improve the openness, accessibility, and persistence of data and materials that underlie scientific research.

Many Labs I

Many Labs I project was a crowdsourced replication study in which the same 13 psychological effects were examined in 36 independent samples to examine variability in replicability across sample and setting.


  • Variations in sample and setting had little impact on observed effect magnitudes
  • When there was variation in effect magnitude across samples, it occurred in studies with large effects, not studies with small effects
  • Replicability was much more dependent on the effect of study rather than the sample or setting in which it was studied
  • Replicability held even across lab-web and across nations
  • Two effects in a subdomain with substantial debate about reproducibility (flag and currency priming) showed no evidence of an effect in individual samples or in the aggregate.

Many Labs II

Conducted in Fall of 2014, Many Labs II employed the same model as Many Labs I but with almost 30 effects, more than 100 laboratories, and including samples from more than 20 countries. The findings should be released in late-2017.

Many Labs III

Many psychologists rely on undergraduate participant pools as their primary source of participants. Most participant pools are made up of undergraduate students taking introductory psychology courses over the course of a semester. Also conducted in Fall of 2014, Many Labs III systematically evaluated time-of-semester effects for 10 psychological effects across many participant pools. Twenty labs administered the same protocol across the academic semester. The aggregate data will provide evidence as to whether the time-of-semester moderates the detectability of effects.