Enhancing Automated Paper Reproduction via Prompt-Free Collaborative Agents
- URL: http://arxiv.org/abs/2512.02812v1
- Date: Tue, 02 Dec 2025 14:24:23 GMT
- Title: Enhancing Automated Paper Reproduction via Prompt-Free Collaborative Agents
- Authors: Zijie Lin, Qilin Cai, Liang Shen, Mingjun Xiao,
- Abstract summary: We propose a prompt-free collaborative agent framework that automatically enhances the quality of paper-to-code generation.<n>Our approach employs two collaborative agents: a verification agent that examines whether the outputs at each step satisfy the requirements specified in the corresponding system prompt, and a refinement agent that revises the outputs based on the identified issues.
- Score: 8.185402940269794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated paper reproduction has emerged as a promising approach to accelerate scientific research, employing multi-step workflow frameworks to systematically convert academic papers into executable code. However, existing frameworks often lack mechanisms to verify and refine the outputs at each generation step, or rely heavily on manually designed prompts for self-refinement, which limits their adaptability and scalability. To address these limitations, we propose a prompt-free collaborative agent framework that automatically enhances the quality of paper-to-code generation. Our approach employs two collaborative agents: a verification agent that examines whether the outputs at each step satisfy the requirements specified in the corresponding system prompt, and a refinement agent that revises the outputs based on the identified issues. Unlike previous methods that require human experts to craft specific refinement prompts for each step, our framework achieves automatic verification and improvement by leveraging only the original system prompts. We integrate our collaborative agents into the Paper2Code framework and conduct comprehensive experiments on PaperBench Code-Dev and Paper2CodeBench datasets. Experimental results demonstrate that our approach significantly improves the accuracy and completeness of reproduced code, achieving performance gains of approximately 15\% and 13\%, respectively, compared to the baseline without our agents. Furthermore, comparative experiments against Self-Refine validate the robustness and consistency of our prompt-free approach across different datasets.
Related papers
- What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction [57.86097956633207]
method is a graph-based agent framework for generating executable code from academic papers.<n>On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, method achieves an average performance gap of 10.04% against official implementations.
arXiv Detail & Related papers (2026-03-02T12:33:31Z) - Paper2SysArch: Structure-Constrained System Architecture Generation from Scientific Papers [10.395280181257737]
We introduce a novel benchmark to quantitatively evaluate the automated generation of diagrams from text.<n>It consists of 3,000 research papers paired with their corresponding high-quality ground-truth diagrams and is accompanied by a three-tiered evaluation metric.<n>We propose Paper2Arch, an end-to-end system that leverages multi-agent collaboration to convert papers into structured, editable diagrams.
arXiv Detail & Related papers (2025-11-22T12:24:30Z) - Automatic Building Code Review: A Case Study [6.530899637501737]
Building officials face labor-intensive, error-prone, and costly manual reviews of design documents as projects increase in size and complexity.<n>This study introduces a novel agent-driven framework that integrates BIM-based data extraction with automated verification.
arXiv Detail & Related papers (2025-10-03T00:30:14Z) - Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm [60.36837655498119]
We propose a Trajectory-based validated-by-Reproducing Agent-benchmark Complexity Evolution framework.<n>This framework takes an original task from an existing benchmark and encourages agents to evolve it into a new task with higher difficulty.<n>Experiments on the GAIA benchmark demonstrate that the TRACE framework consistently enhances task complexity while improving the reliability of correctness.
arXiv Detail & Related papers (2025-10-01T01:52:52Z) - Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z) - Reflective Paper-to-Code Reproduction Enabled by Fine-Grained Verification [46.845133190560375]
Motivated by how humans use systematic checklists to efficiently debug complex code, we propose textbfRePro, a textbfReflective Paper-to-Code textbfReproduction framework.<n>It automatically extracts a paper's fingerprint, referring to a comprehensive set of accurate and atomic criteria serving as high-quality supervisory signals.<n>It achieves 13.0% performance gap over baselines, and it correctly revises complex logical and mathematical criteria in reflecting.
arXiv Detail & Related papers (2025-08-21T06:57:44Z) - AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage [62.049868205196425]
AutoReproduce is a framework capable of automatically reproducing experiments described in research papers in an end-to-end manner.<n>Results show that AutoReproduce achieves an average performance gap of $22.1%$ on $89.74%$ of the executable experiment runs.
arXiv Detail & Related papers (2025-05-27T03:15:21Z) - DocAgent: A Multi-Agent System for Automated Code Documentation Generation [7.653779364214401]
We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building.<n>Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation.<n>We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness.
arXiv Detail & Related papers (2025-04-11T17:50:08Z) - CCA: Collaborative Competitive Agents for Image Editing [55.500493143796405]
This paper presents a novel generative model, Collaborative Competitive Agents (CCA)<n>It leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks.<n>The paper's main contributions include the introduction of a multi-agent-based generative model with controllable intermediate steps and iterative optimization.
arXiv Detail & Related papers (2024-01-23T11:46:28Z) - Agents meet OKR: An Object and Key Results Driven Agent System with
Hierarchical Self-Collaboration and Self-Evaluation [25.308341461293857]
OKR-Agent is designed to enhance the capabilities of Large Language Models (LLMs) in task-solving.
Our framework includes two novel modules: hierarchical Objects and Key Results generation and multi-level evaluation.
arXiv Detail & Related papers (2023-11-28T06:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.