Replicability is a cornerstone of scientific progress. Yet, replications are often undervalued, and are sometimes seen as redundant, unimportant, or lacking novelty. This impedes their broader adoption in research and beyond. In response, the credibility revolution calls for slower, more deliberate science and greater responsiveness to fallibility. In this perspective piece, we argue that (a) replications are essential for validating scientific claims, (b) replications need to be made more visible, recognized, and integrated into research and educational practices, and (c) we can change the way we view and judge replication results. We propose a framework where replication studies can be systematically tracked and normalized through the Replication Hub as part of the Framework for Open and Reproducible Research Training (FORRT) initiative, with the goal of enhancing the visibility, integration, and cumulative impact of replication research across disciplines.
Keywords: Metascience; Open scholarship; Open science; Replicability; Replications; Reproducibility; Replication Crisis
Converted from the manuscript (a Word document). Prose reproduced verbatim with docx conversion artifacts cleaned; in-text citations kept as the authors wrote them. Figures (the FReD Explorer, FReD Annotator, and Replication Hub screenshots) are referenced but not reproduced, and the full reference list is omitted — see the source.
Replication is the process by which researchers test whether the same claims, in identical, similar, or varying contexts, lead to conclusions consistent with those of the original study (Parsons et al., 2022). Replications are a cornerstone of empirical research, where independent sources contribute cumulative evidence to support or refute a given claim. However, recent metascientific studies across scientific fields, particularly those in social, cognitive, and behavioral sciences, have found that many prominent findings ‘fail’ to replicate, that is, their results do not converge with those from the target studies and challenge the credibility of previous scientific claims (Brodeur et al., 2024; Ioannidis, 2005; Nosek & Errington, 2020).
Even when studies do replicate, the observed effects are often much smaller (Patil et al., 2016), averaging half of the originally reported effect sizes (Camerer et al., 2018; Open Science Collaboration, 2015). Such patterns appear across fields, including psychology, medicine, biology, economics, and neuroscience. This has been coined a ‘replication crisis’, raising concerns regarding the robustness of scientific knowledge and challenging the validity of decades of research. In response, this so-called crisis has given rise to a grassroots open science reform movement and the emergence of the field of metascience, that is, research on how science is conducted.
Despite this movement and evidence that many studies do not replicate, replication attempts are still rare. Moreover, journals are often unwilling to publish replication studies, which compromises our ability to build robust bodies of evidence to inform policy and practice. It also highlights that replications are not yet given the recognition they deserve, particularly by journal editors, funders, policymakers, and even researchers themselves. In contrast, novel results are often less scrutinized regarding reproducibility, published more readily, and cited more frequently (Scheel et al., 2021; Serra-Garcia & Gneezy, 2021).
Contrasting the underappreciation of replications, especially those that challenge long-established original findings, we argue for replications to be seen as a critical resource for designing, conducting, and interpreting research. Viewing the ‘replication crisis’ as a ‘credibility revolution’ (Korbmacher et al., 2023; Vazire, 2018) and an opportunity (Munafò et al., 2022), we are not alone in calling for slower, more deliberate science and greater responsiveness to fallibility. In the following sections, we discuss (a) the essential role of replications in science, (b) the need for their increased visibility and recognition through systematic tracking, (c) necessary changes in the way we judge replication results, and (d) future directions for replication practices in professional and educational contexts.
There is a common and longstanding narrative of science being built on replications, but recently they have been heralded as a key tool for ‘saving science’ (Edlund et al., 2021). Two fundamental aspects of science make replications indispensable: First, given the probabilistic nature of research and the myriad contextual and random factors affecting outcomes, no single study can be conclusive — including in the social, cognitive, and behavioral sciences. Second, science should be self-correcting, cumulative, and incremental, with progress building on prior work.
Despite this, current scientific practice often prioritizes novelty over replication and treats individual findings as definitive rather than part of a larger evolving picture. The credibility revolution has underscored the dangers of prioritizing flashy and unexpected results over robustness. For example, research on social priming appeared so convincing that Nobel laureate Daniel Kahneman dedicated a chapter to it in Thinking, Fast and Slow (Kahneman, 2011); but once preregistered replications were conducted more systematically, multiple independent teams failed to replicate the originally reported social priming effects, and the field became emblematic of concerns surrounding research integrity.
Direct replications are a crucial safeguard against the immense resource waste of building a literature on false positive findings (Zwaan et al., 2018). By recreating studies with highly similar or identical methods and sample characteristics, direct replications help to identify which findings are reliable (as opposed to the previously more common conceptual replications, which include differences in sample, design, measurement and/or analysis). Given the regular occurrence of false positive results — significantly amplified by publication bias and questionable research practices — multiple and direct replications are essential. Replications per se are not ‘better’ than original studies; each study needs to be judged on its own merits.
Beyond verifying the existence of an effect, especially when science moves towards application, it is crucial to estimate accurate effect sizes to determine practical significance. Achieving greater precision is dramatically improved through larger sample sizes, and biases in the literature often exaggerate effect sizes (e.g. publication bias). In addition to corroborating or challenging original claims, replications also help identify ‘boundary’ conditions that affect the presence and/or magnitude of effects — particularly when moving beyond limited contexts (e.g. WEIRD populations). Direct or close replications ensure core effects hold under similar circumstances; conceptual replications are a crucial next step, deliberately varying contextual or methodological features to assess robustness and generalizability (e.g. Tunç and Tunç’s Systematic Replications Framework).
While many studies replicate main effects before testing interactions, moderators, or mediators, these tests are rarely labeled as replications and often deviate from original protocols. This lack of consistency in naming and methods limits the accumulation of evidence and the tracing of ‘failed’ replications, which usually remain unpublished. Importantly, 70% of researchers have reported failing to replicate findings at least once (Baker, 2016), yet the low publication rate suggests many of these attempts are left in the metaphorical ‘file drawer’ (Rosenthal, 1979), keeping potentially flawed research lines alive.
Taken together, these developments highlight why replicating results and making them more visible are fundamental to producing reliable, trustworthy science. Fostering a culture of replication offers benefits beyond merely assessing individual claims: the expectation of future replication can improve reporting practices, reduce errors, and potentially even prevent fraud. Despite these promises, existing estimates suggest that between 0.2% and 5% of published studies in psychology are replications, with even lower rates in other fields, and there is no standardized way of indexing them. Developing comprehensive databases of replication studies is one way to remedy this.
Practical solutions are essential to shift replication studies from a niche effort to a mainstream scientific practice. To make replications more mainstream and visible, we created a comprehensive database of replications as a resource for research and teaching. At present, the FORRT Replication Database (FReD) contains a large index of original studies, their replications, and their raw statistics and effect sizes (n = 1,118 original articles and n = 1,137 replication references from 151 different journals and 167 contributors as of 2025-02-11). With over 160 researchers having contributed since its conception in April 2022, we aim for this to be a living, community-driven solution for collecting, updating, and disseminating replications.
This database is embedded within the FORRT Replication Hub, a comprehensive and living resource where authors, reviewers, educators, and editors can log and access replication studies. FReD hosts: (1) the FReD Explorer (a database of original studies and their replications); (2) the FReD Reference Annotator (a tool to check reference lists for replications); and (3) a list of large-scale replication projects. This centralized resource facilitates finding replications, eases integration into scholarly workflows, and facilitates the citation of replications alongside original studies.
Historically, the initial version of the database was created by gathering instances of replication failures and successes from sources such as scientific mailing lists, blogs, and social media platforms (see also the FORRT Replications & Reversals project). Subsequently, participating FORRT volunteers contributed information about replication studies from their subfields over multiple years and at various hackathons starting in 2018, recording for each study the citation, study design, sample sizes, and effect sizes of both the original and replication work.
This database has some limitations. Due to the self-selected sample of studies, we explicitly refrained from presenting simple summaries or inferential statements about fields or subfields based on the database alone. The resource is not an exhaustive list of replications (or even ‘failed’ replications), as the initial collection process was biased towards famous original works. New evidence is added weekly (still largely volunteer-driven, with recent financial support from the Center for Open Science), and we are making efforts to safeguard against such selection biases. Lastly, our own effort to collate quantitative features of replications has its own subjectivity and researcher degrees of freedom.
(Figure 1: the FReD Explorer — automated summary of selected replications and success rates, with filtering options. Figure 2: the FReD Annotator — checks reference/reading lists to identify replication studies.)
Once researchers begin to conduct more replications, the next challenge is ensuring that replications become a more easily accessible, valued, and normative part of scientific practice. Key interested parties — researchers, journals, funders, and policymakers — play critical roles in embedding replication into the research culture.
The full value of replications can only be realized if they are systematically incorporated into grant applications, publications, and educational curricula. For example, educators could include replication studies in their syllabi and let students conduct their own small-scale replications. Our own bottom-up efforts need to be reinforced by top-down support from journals and funders: explicit incentives for replication research, more replication-specific journals, and revised manuscript evaluation criteria that reduce the emphasis on novelty. Some journals already invite replications (e.g. Replication Research), the Registered Reports format reduces publication bias by reviewing study designs before data collection, and funders like the Dutch NWO and German DFG offer replication grants. Universities can adapt curricula prioritizing transparent and robust science (e.g. FORRT’s Lesson Plans, Clusters, and Curated Resources), and communicators should shift away from “sensational” findings.
Replication plays a critical role in ensuring robustness, but it is vital to acknowledge the complexity behind failed replications, which can arise for many reasons. Understanding these is essential to a constructive — rather than punitive — approach. Potential explanations range from questionable research practices and publication bias to measurement error and the inherent heterogeneity of social and psychological phenomena.
One significant factor is the historic, widespread issue of low statistical power: underpowered studies are more prone to false positives and inflated effect sizes. The ‘crud factor’ — the tendency for almost everything to be weakly correlated — makes it challenging to distinguish meaningful effects from noise, so large-sample studies may detect effects lacking real-world significance. Moreover, social, cognitive, and behavioral effects are not universal and may vary across time, population, location, or context; heterogeneity can cause genuine effects to fail under different circumstances without invalidating the original findings. Thus replication failures can help identify boundary conditions and reveal moderators or mediators, rather than simply indicating a lack of support for a hypothesis. While there is no consensus on how to classify replications on a spectrum between successful and failed, the credibility revolution gives us the chance to drive reform.
We propose four key features a scientific ecosystem can adopt to take full advantage of replication research: (1) findability of replications, (2) widespread adoption of open science practices, (3) education and training surrounding replications, and (4) incentivizing replications.
First, replication studies should be easy to find. It would be ideal if search engines could automatically tag replication studies, though this is error-prone and human, crowd-sourced validation is likely to remain essential to guarantee accuracy and interpretative nuance — an approach we adopted in developing the FORRT Replication Hub. The hub consolidates human-generated replication projects (the Replications & Reversals project, FReD, and a handbook for conducting replications) and includes a dedicated journal, Replication Research. Other innovations include PubPeer and tools like Zotero plug-ins and Scite.ai that flag articles with replication discussions and retraction notices.
Second, primary research needs to adopt open science practices across the board: at a minimum, detailed methods, open materials, open data (when ethically appropriate), and open analysis code, with preregistration or Registered Reports to clearly label confirmatory vs exploratory analyses. Unfortunately, transparency is still uncommon, and authors are not very responsive to data requests (of 65 contacted researchers from “available upon request” studies, only 27% actually shared data). Journals should make transparency the default.
Third, researchers should be trained in replication-related methodologies (equivalence testing, verification of original studies, reproducibility tests, sample-size planning and power analyses, effect-size and confidence-interval calculations, preregistration, and replication success criteria). Teaching about replication research needs to be a major cornerstone of teaching science.
Fourth, replication research needs to be rewarded. Universities and funders should officially recognize the value of replication studies. Updating journal submission guidelines could include a Pottery Barn rule — “you break it, you buy it” — which requires journals to publish replications of studies they previously published (a policy implemented by Royal Society Open Science). As of February 2025, 131 journals have implemented policies supporting replication studies (TOP Factor level 3). A more systematic evaluation process based on cost-benefit analyses could help determine which studies most urgently need replication.
(Figure 3: the FORRT Replication Hub. The FORRT tower icon indicates a resource is available in the Hub; all other projects are currently in development.)
Replications are intricate and complex. We recommend that the scientific community adopts a pluralistic and dynamic approach to replication — one that appreciates the various reasons why effects may fail to replicate and avoids treating every replication failure as a definitive refutation. Replications should be valued for their role in refining theories and improving the cumulative understanding of scientific phenomena. Initiatives such as the FORRT Replication Hub provide a platform to make replications more visible, accessible, rewarding, and integral to scientific discourse. Ultimately, replications should not be seen as a final verdict but as a dynamic part of the scientific process that drives progress through a continuous and cumulative reassessment of claims and evidence.