Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
- URL: http://arxiv.org/abs/2504.17192v4
- Date: Fri, 10 Oct 2025 04:32:30 GMT
- Title: Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
- Authors: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang,
- Abstract summary: We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.<n>PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.<n>We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
- Score: 70.04746094652653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, where it constructs a high-level roadmap, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files; analysis, which focuses on interpreting implementation-specific details; and generation, where modular, dependency-aware code is produced. Moreover, each phase is instantiated through a set of specialized agents designed to collaborate effectively across the pipeline. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations, particularly from the authors of those papers, with author-released repositories as ground truth if available. Our results demonstrate the effectiveness of PaperCoder in creating high-quality, faithful implementations. Furthermore, it consistently shows strengths in the recently released PaperBench benchmark, surpassing strong baselines by substantial margins. Code is available at: https://github.com/going-doer/Paper2Code.
Related papers
- DeepCode: Open Agentic Coding [11.7906174865581]
DeepCode is a fully autonomous framework for document-to-codebase synthesis.<n>It orchestrates four information operations to maximize task-relevant signals under finite context budgets.<n>Extensive evaluations on the PaperBench benchmark demonstrate that DeepCode achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T16:07:13Z) - Bridging the Prototype-Production Gap: A Multi-Agent System for Notebooks Transformation [0.0]
This paper presents Codelevate, a novel multi-agent system that automatically transforms Jupyter notebooks into well-structured, maintainable Python code repositories.<n>Our system employs three specialized agents - Architect, Developer, and Structure - working in concert through a shared dependency tree to ensure architectural coherence and code quality.
arXiv Detail & Related papers (2025-11-10T16:05:10Z) - AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers [9.851681616116718]
We introduce Paper-to-Code'' (P2C), a novel task that transforms the multimodal content of scientific publications into fully executable code repositories.
We propose AutoP2C, a multi-agent framework based on large language models that processes both textual and visual content from research papers to generate complete code repositories.
arXiv Detail & Related papers (2025-04-28T05:47:37Z) - Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs [4.616570111453259]
Large language models (LLMs) struggle with generating entire deep learning projects.<n>We propose a novel planning-guided code generation method, DLCodeGen, tailored for generating deep learning projects.
arXiv Detail & Related papers (2025-04-21T13:09:25Z) - DocAgent: A Multi-Agent System for Automated Code Documentation Generation [7.653779364214401]
We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building.<n>Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation.<n>We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness.
arXiv Detail & Related papers (2025-04-11T17:50:08Z) - AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation [17.020112052995334]
A typical multi-agent framework consists of Large Language Model (LLM)-based agents.<n>AdaCoder is a novel adaptive planning, multi-agent framework for function-level code generation.
arXiv Detail & Related papers (2025-04-05T16:14:01Z) - CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation [24.090719826360342]
We introduce CodeIF, the first benchmark designed to assess the abilities of Large Language Models (LLMs) to adhere to task-oriented instructions within code generation scenarios.<n>We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting the demands of these tasks.
arXiv Detail & Related papers (2025-02-26T14:19:49Z) - EpiCoder: Encompassing Diversity and Complexity in Code Generation [66.43738008739555]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware [13.27883339389175]
We propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors.
Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code.
arXiv Detail & Related papers (2023-12-10T05:36:06Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.