Reproducibility Project: Cancer Biology (RP:CB) Overview



The Reproducibility Project: Cancer Biology (RP:CB) is an initiative to conduct direct replications of 50 high-impact cancer biology studies. The project anticipates learning more about predictors of reproducibility, common obstacles to conducting replications, and how the current scientific incentive structure affects research practices by estimating the rate of reproducibility in a sample of published cancer biology literature. The RP:CB is a collaborative effort between the Center for Open Science and network provider Science Exchange, and will be published in eLife.

Through independent direct replication studies, the project aims to identify best practices that maximize reproducibility and facilitate an accurate accumulation of knowledge, enabling potentially impactful novel findings to be effectively built upon by the scientific community.

Additionally we expect to learn about:

  • The overall rate of reproducibility in a sample of the published cancer biology literature.
  • Obstacles that arise in conducting direct replications of original studies.
  • The feasibility and practical challenges of getting proper materials, methods, and instrumentation for a replication.
  • Predictors of replication success such as the journal in which the original finding was published, the citation impact of the original report, the number of direct replications that have been published elsewhere, the transparency of materials and methods included with the publication, and adherence to publishing checklists and guidelines. 

What The Community is Saying

 "For just $2 million in private funding — less than a typical 5-year grant from the US National Institutes of Health to a single investigator — this replication project shines a very public light on the sticking points of experiments."  

— Editorial in Nature

"The composite picture is, there is a reproducibility problem.”  
— John Ioannidis, Stanford University

“‘More than 50 years ago, the philosopher Thomas Kuhn defined ‘normal science’ as the kind of work that faithfully supports or chisels away at current hypotheses. It is easy to dismiss this as workmanlike and uninteresting. But only by doing such normal science — and doing it well — can we recognize when revolutions are necessary.

— Editorial in Nature

Frequently Asked Questions About the RB:CB Project

On the importance of this project:

Why is this project important?

This much-anticipated project was initiated in response to multiple reports published from the pharmaceutical industry indicating that more than 70% of published findings could not be reproduced, impacting the ability to build upon these results to develop new therapeutic agents. Despite intense scrutiny around reproducibility in science, this project represents the first practical evaluation of reproducibility rates in the biomedical field with the potential to identify specific methods that result in reproducible studies. This project is completely open and transparent, with all process information and data made publicly available in eLife. In addition, all of the replication studies were conducted by completely independent labs that were part of the Science Exchange network of contract research organizations (CROs), academic labs, and government facilities.

Why does reproducibility matter?

Increasing openness and reproducibility increases the efficiency and quality of knowledge accumulation and application. Increasing access to the content and process of producing research outcomes increases reproducibility of the evidence, and facilitates replication and extension into new domains. False leads can be discovered and discarded more quickly and true leads can be elaborated more efficiently, accelerating the path toward solving humanity’s pressing problems.

What is transparency and why does it matter?

Typically, researchers are rewarded with publication for showing the cleanest and most exciting-sounding results, even though typical research results are nuanced and more difficult to interpret. This reward system results in research papers that only show a summary of the work that was actually conducted in the research lab. Transparency of the entire research workflow allows others in the scientific community to examine, build upon, and accurately evaluate all of the content that is usually not preserved. This includes all of the precise methods used in the work, all of the digital and physical materials used to conduct the work, all of the data collected, and the details of the data analysis.

Do you think there is a crisis in science?

We propose that there is an efficiency crisis not a science crisis. Scientific progress is still being made, but not at the rate at which it could be -- because efficiency is being hampered by an incentive system that does not reward openness and reproducibility, two core values of scholarship. These values underpin the reality that scientific claims do not become credible based on the authority or persuasiveness of their originator. Rather, claims become credible via transparent communication of the supporting evidence and the process of acquiring that evidence.

In the present scholarly culture, openness and reproducibility are values but not standard practice. Incentives driving researchers and service providers do not promote these values. For researchers, the currency of reward is publication. Publishing frequently in the most prestigious outlets possible is the gateway to jobs, promotion, tenure, grants, and awards. Whether the research is open or reproducible is rarely relevant to publication success. Instead, publication depends on achieving novel, positive, clean outcomes. In a competitive marketplace, researchers may make choices--even unwittingly--that increase the likelihood of obtaining publishable outcomes even at the cost of their accuracy. Without transparency or efforts to evaluate reproducibility, the loss of accuracy may go undetected decreasing the credibility of the published literature.

The need to reform the research scientists’ reward system must be better communicated to the public. Funding sources and policy makers are increasingly accountable to voices of non-scientists and those who, in general, may already be biased against the complexity of accurate data reporting.

RP:CB project info and collaborator overview:

What is the RP:CB project?

The Reproducibility Project: Cancer Biology is a collaboration between Science Exchange and the Center for Open Science, with results published in eLife. We are independently replicating a subset of experimental results from a number of high-profile papers in the field of cancer biology studies published between 2010-2012 using the Science Exchange network of expert scientific labs.

When was the project initiated?


Who is paying for this work?

The Reproducibility Project: Cancer Biology has been funded through a $1.3 million grant from the Laura and John Arnold Foundation, which funds projects to promote transformational change. Top scientific suppliers have provided additional support through research reagents. All funding and supporting organizations can be found on the Funding and Supporting Organizations page. No funding was requested from the authors.

What is the Center for Open Science?

The Center for Open Science (COS) is a non-profit technology startup founded in 2013 with a mission to increase openness, integrity, and reproducibility of scientific research. COS pursues this mission by building communities around open science practices, supporting metascience research, and developing and maintaining free, open source software tools. The Open Science Framework (OSF), COS’s flagship product, connects and supports the research workflow, enabling scientists to increase the efficiency and effectiveness of their research. Researchers use the OSF to collaborate, document, archive, share, and register research projects, materials, and data. Learn more at and

What is Science Exchange?

Science Exchange is the world’s leading marketplace for scientific research services. The company provides secure access to a network of over 3000 screened and verified contract research organizations (CROs), academic labs, and government facilities that are available to conduct experiments on the behalf of scientists. The Science Exchange platform has been used by scientists from over 2,500 different companies and organizations, solving one of the most significant challenges facing the highly-trained researchers at these companies: time and resources spent identifying and managing outsourced research projects. To date, Science Exchange has raised over $30 million from Maverick Capital Ventures, Union Square Ventures, Index Ventures, OATV, the YC Continuity Fund, and others. For more information visit


What is eLife?


eLife is a unique collaboration between the funders and practitioners of research to improve the way important research is selected, presented, and shared. eLife publishes outstanding works across the life sciences and biomedicine — from basic biological research to applied, translational, and clinical studies. All papers are selected by active scientists in the research community. Decisions and responses are agreed by the reviewers and consolidated by the Reviewing Editor into a single, clear set of instructions for authors, removing the need for laborious cycles of revision and allowing authors to publish their findings quickly. eLife is supported by the Howard Hughes Medical Institute, the Max Planck Society, and the Wellcome Trust. Learn more at

How did we arrive at the original 50 papers to replicate?

We determined the 400 most-cited cancer biology papers from Web of Science in 2010, 2011, and 2012 and the top 400 most-cited papers from Scopus in 2010, 2011, and 2012. From these, we excluded clinical trials, papers requiring extremely specialized equipment, and case studies. For the remaining papers, we calculated a normalized impact rating and chose the top 16 or 17 papers from each year. See details of the paper selection method here.

What were the main approaches you took in doing the replications?

Our introductory paper outlines our approach in detail, which are summarized as follows: once we agreed upon which key sets of experiments we would replicate, we wrote detailed protocols based on the methods reported in the original paper. These methods included the necessary reagents, controls, and a detailed, step-by-step experimental protocol. We attempted to get a much additional information as possible from the original authors.

We relayed the detailed protocol to appropriate, verified Science Exchange research service providers. Any additional questions between the service provider and the authors were coordinated through Science Exchange and the Center for Open Science. In the Registered Report, we described all known differences between the original study and the replication study.

How many studies are you planning to work on going forward?

The RP:CB has 29 accepted and peer-reviewed Registered Reports on the eLife site. The final number of replications that will be conducted will optimize the quality of each replication with the funding available.

On Registered Reports, preregistration and transparency:

What is a Registered Report/Replication Study?

Registered Reports are peer-reviewed research plans. Each Registered Report contains a complete description of the work that will be conducted and a plan for how it will be analyzed. By conducting the peer review before results are known, this process eliminates many of the biases that occur after seeing the results. Furthermore, it focuses expert evaluation of the protocols earlier in the process, where improvements can be added. For example, peer reviewers will expect that various quality control steps are conducted at critical steps in the workflow prior to measuring the final results. Finally, it improves the planned statistical analyses by addressing issues of low sample size and appropriate inference by considering the potential need to accurately interpret null results.

The final Replication Study contains all of the results specified in the Registered Report. The statistical analyses that were specified in the Registered Report are provided, and any unplanned analyses are provided as exploratory results. Unexpected but potentially exciting results that appear when exploring a dataset must be replicated with newly-conducted research before those results can be verified. A fundamental problem in the current process of scientific publishing is the unintentional presentation of exploratory results as more rigorous, confirmatory results.

See our Registered Reports page for more details and complete list of journals using this format.

What is a preregistration?

A preregistration is a research plan created before conducting a study. For the Reproducibility Project: Cancer Biology, each peer-reviewed Registered Report is a preregistration. However, other researchers will often create preregistrations that are not peer-reviewed until the final study is ready to be submitted to a journal for publication, after results are known. This often occurs in clinical sciences using, which is a registry where planned research studies are deposited before results are known.

A preregistration can include just details about a study plan, which includes information such as the hypotheses, experimental methods, and the design of the study. A preregistration can also include a complete analysis plan, which will specify exactly how the collected data will be analyzed to address the hypotheses. Preregistrations address two current problems with the process of science. First, they “open the file drawer” by ensuring that all conducted research eventually becomes discoverable through the registry in which the preregistration was created. Second, when they also include an analysis plan, they address questionable research practices that lead to the problems that this project demonstrates: the fact that any dataset can unintentionally show statistical significance even when there is no real effect.

The Center for Open Science is working to broaden the use of preregistrations to the pre-clinical research field to include all empirical research. See the Preregistration Challenge for details.

What is OSF (Open Science Framework)?

OSF (Open Science Framework) provides free and open source project management support for researchers across the entire research lifecycle. As a flexible repository, researchers can store and archive their research data, protocols, and materials. As a collaboration tool, researchers can work on projects privately with a limited number of collaborators and make parts of their projects public, or make all the project publicly accessible for broader dissemination. As a workflow system, researchers can connect the many services they use to streamline their process and increase efficiency.

  • Structured projects: Researchers can access files, data, code, and protocols in one centralized location and easily build custom organization for each project. Such structure eliminates the need to trawl emails to find files or scramble to recover lost data
  • Controlled access: OSF users can control which parts of a project are public or private, making it easy to collaborate and share with the community or just one team
  • Enhanced workflow: Researchers can automate version control, get persistent identifiers for projects and materials, preregister research, and connect third party services directly to OSF.

Replication success/failure terms and issues:

What steps were taken for quality control of the methodology of the replications?

Each study’s methodology was peer-reviewed before conducting any research using the Registered Reports publication format (see above and It is important that this peer review was conducted before results were known because it ensured that experts could review the exact methodology and analysis without any bias from knowing the results. The peer review also ensured that quality checks were included at appropriate points to allow for interpretation of any result. The authors of the original paper were invited to participate in this peer review before results were known.

What is your definition of "replicate"?

One of the goals of the project is to help clarify what that definition is. The tendency is to answer the question, “Was the study replicable?” with a binary answer, “yes” or “no”, but it is much more nuanced than that. One definition of “Replicate” is to reproduce original processes and results. What is this acceptable margin of error? There are multiple, complex factors that determine this margin; we present those in each replication study discussion. Another measure of replication is subjective assessment. As part of the RP:CB project, we are gathering this information through a formal survey provided by independent scientists, and will be analyzing these data at the end of the project along with other aspects of the process. We recruit scientists in biomedical research to review the results of the replications and give their subjective opinion of whether the original result was successfully replicated before the Replication Studies are published. This work is ongoing.

What can you conclude about the original study from these replications?

All scientific studies are estimates. Does the experimental treatment have an effect compared to the control? How big was that effect and was it statistically different? Each estimate has a margin of uncertainty. The replication tells us whether the original processes and results can be reproduced within that margin. There are many reasons that two studies of the same phenomenon could yield different results, and only one of those reasons is that the original was a false positive. Other causes for different outcomes between original and replication have implications for understanding the phenomenon itself. For example, the conditions necessary to obtain the result may not be yet understood. There may be many reasons for this--poor data quality, incomplete instructions or processes, missing components. The point of replication is to increase the likelihood that these shortcomings will be resolved in advance through better transparency, planning, and communication.

Implications of the study:

How can reproducibility be improved?

Improving reproducibility requires culture change in the incentives that drive researchers’ behavior, the infrastructure that supports research, and the business models that dominate scholarly communication. A key challenge is that the decentralized nature of the scholarly community creates a coordination problem. Culture change requires simultaneous movement by funders, institutions, researchers, and service providers across national and disciplinary boundaries. Despite this, the vision is achievable because openness, integrity, and reproducibility are shared values, the technological capacity is available, and alternative sustainable business models exist.

What are the implications of the results of the initial RP:CB studies?

Replication is hard, but critically important. Existing reports (such as those by Amgen, Bayer, and others) confirm that there is a general consensus on reproducibility rates, that replication is important, and that there are commonalities across disciplines regarding the challenge of reproducibility. The RP:CB studies, though still preliminary, are helping to reveal exactly why replication studies aren't done more often, why they take so long, how much they cost, and what we can do to make them easier. One significant barrier to replication is getting information about the original work; it can be extremely difficult and takes the majority of the time. Cooperation is hard to get, but it’s healthy for the practice of science. The project has also revealed that openness and transparency are critical to moving research forward, and that preregistration is critical to the replication process.

What do these results suggest about the scientific process?

It is messy, inconsistent, siloed, and challenging. Science demands collaboration; the more minds that tackle a research hypothesis, the more parameters can be tested, controlling the effects of unexpected bias and ultimately yielding a more accurate theory.