Leveraging the OSF for large-scale collaboration

There is substantial interest in the extent to which published findings in socio-behavioral sciences (SBS) are reproducible. While large-scale replication projects in psychology and economics (e.g. Reproducibility Project: Psychology and Experimental Economics Replication Project) provided initial evidence that results vary dramatically in their ability to be independently reproduced or replicated, there is still much to learn about reproducibility across business, education, political science, sociology, and other areas of socio-behavioral sciences. To that end, the Center for Open Science (COS), supported by the Defense Advanced Research Projects Agency (DARPA), is working to help advance this understanding. DARPA’s Systematizing Confidence in Open Research and Evidence (SCORE) program aims to develop and deploy automated tools that can assign "confidence scores" to published SBS research results and claims. Confidence scores are quantitative measures that could enable consumers of SBS research to understand the degree to which a particular claim or result is likely to be reproducible or replicable (for an overview of the program see this preprint).

The role of COS in the program is twofold:  first, we provide the initial data for the program by curating a database of approximately 30,000 scientific papers published between 2009 and 2018 from over 60 academic journals in the socio-behavioral sciences (visit www.cos.io/score for a complete list of journals).  Second, we collaborate with researchers across the globe to perform reproductions (re-analysis of existing data) and replications (analysis of new data) of claims identified from papers within this database. To date, over 200 replications and reproductions have been completed, with more being added every day.  

With hundreds of teams of collaborators participating in this project and sharing key materials from their replication and reproduction efforts, including data, code, analyses, preregistrations, and final reports, the SCORE team relies heavily on the Open Science Framework (OSF) as an easy-to-use central repository. OSF helps us gather materials necessary for each project while maintaining the appropriate level of confidentiality for what is being stored using built-in privacy features. We use internal scripts to interact with the OSF API and seamlessly upload and download specific pieces of information. Data Manager Simon Parsons found that “being able to automate tasks related to OSF, like project and wiki creation, substantially reduced our workload and prevented many human errors.” 

We structure each OSF project to correspond to a paper being reproduced or replicated. Each subcomponent is designed to host original materials from the original paper, power analyses used to determine target sample sizes, and different replication/reproduction attempts. Within those subcomponents (Figure 1), the team further divides the subcomponents into a “methods/materials” section which hosts items such as participant surveys or stimuli; a “data” section that contains raw and cleaned data files; and an “analysis” section for analytic code files. Research Scientist Nick Fox noted that “OSF allows us to effectively manage a high-volume program that includes multiple internal handoffs while ensuring necessary confidentiality.” All COS SCORE team members are added as administrators, but individual contributors are added separately to ensure they are being assigned to the correct project with the appropriate level of permissions. 

 

Figure 1: Structure of an OSF page within the SCORE project

We created a centralized database of the Globally Unique Identifiers (GUIDs) provided by the OSF to catalog new projects, navigate to existing projects, and effortlessly find what was needed at a given moment. Being able to easily track and organize the thousands of OSF subcomponents is a big plus for a program of this size.

Variability in how researchers compile and share their data is a given within large-scale collaboration programs, which has forced us to adapt and change over time. To ensure that the project materials from every replication/reproduction will be maximally useful to future consumers, we request that each collaborator include a description of the materials provided on their OSF project, as well as articulate their sharing permissions. By taking this step, regardless of whether files are organized into their respective subcomponents, and regardless of what types of files they may be, it will be easier for a future consumer of the collaborator’s research to understand and make use of what they have shared.

In addition to functioning as a highly organized central repository, OSF also allows preregistration (i.e., specification of a research plan in advance of conducting the study) of each of the individual replication and reproduction projects. Materials documentation and preregistration are just two examples of how the OSF facilitated collaborators’ adoption of open science practices through direct implementation, an integral part of COS’s broader mission of increasing the openness, integrity, and reproducibility of scientific research.

We still have many replications and reproductions underway, and new collaboration initiatives are just beginning. “OSF made it possible for us to achieve our goals from Phase 1 of the program, and we are excited to see what comes from Phase 2!” remarked Zach Loomas, Project Coordinator. While over 200 projects are completed, we anticipate completing hundreds more. 

If you are interested in participating in our efforts, please consider joining the SCORE collaborators list by filling out this form. And stay tuned to the COS blog for more on our progress during Phase 2.

Recent Posts