AI Copilots for Reproducibility in Science: A Case Study
- URL: http://arxiv.org/abs/2506.20130v1
- Date: Wed, 25 Jun 2025 04:56:28 GMT
- Title: AI Copilots for Reproducibility in Science: A Case Study
- Authors: Adrien Bibal, Steven N. Minton, Deborah Khider, Yolanda Gil,
- Abstract summary: This paper introduces OpenPub, an AI-powered platform that supports researchers, reviewers, and readers through a suite of modular copilots.<n>In this work, we present the Reproducibility Copilot, which analyzes manuscripts, code, and supplementary materials to generate structured Jupyter Notebooks.<n>Results indicate that OpenPub can substantially reduce reproduction time while achieving high coverage of figures, tables, and results suitable for computational reproduction.
- Score: 2.9057981978152116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open science initiatives seek to make research outputs more transparent, accessible, and reusable, but ensuring that published findings can be independently reproduced remains a persistent challenge. This paper introduces OpenPub, an AI-powered platform that supports researchers, reviewers, and readers through a suite of modular copilots focused on key open science tasks. In this work, we present the Reproducibility Copilot, which analyzes manuscripts, code, and supplementary materials to generate structured Jupyter Notebooks and recommendations aimed at facilitating computational, or "rote", reproducibility. We conducted feasibility tests using previously studied research papers with known reproducibility benchmarks. Results indicate that OpenPub can substantially reduce reproduction time - from over 30 hours to about 1 hour - while achieving high coverage of figures, tables, and results suitable for computational reproduction. The system systematically detects barriers to reproducibility, including missing hyperparameters, undocumented preprocessing steps, and incomplete or inaccessible datasets. These findings suggest that AI-driven tools can meaningfully reduce the burden of reproducibility efforts and contribute to more transparent and verifiable scientific communication. The modular copilot architecture also provides a foundation for extending AI assistance to additional open science objectives beyond reproducibility.
Related papers
- AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage [62.049868205196425]
AutoReproduce is a framework capable of automatically reproducing experiments described in research papers in an end-to-end manner.<n>Results show that AutoReproduce achieves an average performance gap of $22.1%$ on $89.74%$ of the executable experiment runs.
arXiv Detail & Related papers (2025-05-27T03:15:21Z) - AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research [58.944125758758936]
The Science of Science (SoS) explores the mechanisms underlying scientific discovery.<n>The advent of artificial intelligence (AI) presents a transformative opportunity for the next generation of SoS.<n>We outline the advantages of AI over traditional methods, discuss potential limitations, and propose pathways to overcome them.
arXiv Detail & Related papers (2025-05-17T15:01:33Z) - IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery [27.218896203253987]
IRIS is an open-source platform designed for researchers to leverage large language models (LLMs)-assisted scientific ideation.<n>IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis.<n>We conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation.
arXiv Detail & Related papers (2025-04-23T14:01:36Z) - A Dataset For Computational Reproducibility [2.147712260420443]
This article introduces a dataset of computational experiments covering a broad spectrum of scientific fields.<n>It incorporates details about software dependencies, execution steps, and configurations necessary for accurate reproduction.<n>It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of tools.
arXiv Detail & Related papers (2025-04-11T16:45:10Z) - CycleResearcher: Improving Automated Research via Automated Review [37.03497673861402]
This paper explores the possibility of using open-source post-trained large language models (LLMs) as autonomous agents capable of performing the full cycle of automated research and review.<n>To train these models, we develop two new datasets, reflecting real-world machine learning research and peer review dynamics.<n>Our results demonstrate that CycleReviewer achieves promising performance with a 26.89% reduction in mean absolute error (MAE) compared to individual human reviewers in predicting paper scores.
arXiv Detail & Related papers (2024-10-28T08:10:21Z) - O1 Replication Journey: A Strategic Progress Report -- Part 1 [52.062216849476776]
This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey.
Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects.
We propose the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process.
arXiv Detail & Related papers (2024-10-08T15:13:01Z) - Autonomous LLM-driven research from data to human-verifiable research papers [0.0]
We build an automation platform that guides interacting through complete stepwise process.
In mode provided annotated data alone, datapaper raised hypotheses, designed plans, wrote and interpreted analysis codes, generated and interpreted results.
We demonstrate potential for AI-driven acceleration of scientific discovery while enhancing traceability, transparency and verifiability.
arXiv Detail & Related papers (2024-04-24T23:15:49Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Data-driven Discovery with Large Generative Models [47.324203863823335]
This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs)
We demonstrate how LGMs fulfill several desideratas for an ideal data-driven discovery system.
We advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms.
arXiv Detail & Related papers (2024-02-21T08:26:43Z) - MONAI Label: A framework for AI-assisted Interactive Labeling of 3D
Medical Images [49.664220687980006]
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models.
We present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models.
arXiv Detail & Related papers (2022-03-23T12:33:11Z) - Improving Reproducibility in Machine Learning Research (A Report from
the NeurIPS 2019 Reproducibility Program) [43.55295847227261]
Reproducibility is obtaining similar results as presented in a paper or talk, using the same code and data (when available)
In 2019, the Neural Information Processing Systems (NeurIPS) conference introduced a program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research.
In this paper, we describe each of these components, how they were deployed, as well as what we were able to learn from this initiative.
arXiv Detail & Related papers (2020-03-27T02:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.