Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
- URL: http://arxiv.org/abs/2504.17192v2
- Date: Sat, 26 Apr 2025 14:03:23 GMT
- Title: Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
- Authors: Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang,
- Abstract summary: We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.<n>PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.<n>We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
- Score: 57.09163579304332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work. In the meantime, recent Large Language Models (LLMs) excel at understanding scientific documents and generating high-quality code. Inspired by this, we introduce PaperCoder, a multi-agent LLM framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, where it constructs a high-level roadmap, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files; analysis, which focuses on interpreting implementation-specific details; and generation, where modular, dependency-aware code is produced. Moreover, each phase is instantiated through a set of specialized agents designed to collaborate effectively across the pipeline. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations, specifically from the original paper authors, with author-released repositories as ground truth if available. Our results demonstrate the effectiveness of PaperCoder in creating high-quality, faithful implementations. Furthermore, it consistently shows strengths in the recently released PaperBench benchmark, surpassing strong baselines by substantial margins. Code is available at: https://github.com/going-doer/Paper2Code.
Related papers
- AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers [9.851681616116718]
We introduce Paper-to-Code'' (P2C), a novel task that transforms the multimodal content of scientific publications into fully executable code repositories.
We propose AutoP2C, a multi-agent framework based on large language models that processes both textual and visual content from research papers to generate complete code repositories.
arXiv Detail & Related papers (2025-04-28T05:47:37Z) - Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs [4.616570111453259]
Large language models (LLMs) struggle with generating entire deep learning projects.<n>We propose a novel planning-guided code generation method, DLCodeGen, tailored for generating deep learning projects.
arXiv Detail & Related papers (2025-04-21T13:09:25Z) - DocAgent: A Multi-Agent System for Automated Code Documentation Generation [7.653779364214401]
We introduce DocAgent, a novel multi-agent collaborative system using topological code processing for incremental context building.<n>Specialized agents (Reader, Searcher, Writer, Verifier, Orchestrator) then collaboratively generate documentation.<n>We also propose a multi-faceted evaluation framework assessing Completeness, Helpfulness, and Truthfulness.
arXiv Detail & Related papers (2025-04-11T17:50:08Z) - AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation [17.020112052995334]
A typical multi-agent framework consists of Large Language Model (LLM)-based agents.<n>AdaCoder is a novel adaptive planning, multi-agent framework for function-level code generation.
arXiv Detail & Related papers (2025-04-05T16:14:01Z) - CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation [24.090719826360342]
We introduce CodeIF, the first benchmark designed to assess the abilities of Large Language Models (LLMs) to adhere to task-oriented instructions within code generation scenarios.<n>We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting the demands of these tasks.
arXiv Detail & Related papers (2025-02-26T14:19:49Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware [13.27883339389175]
We propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors.
Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code.
arXiv Detail & Related papers (2023-12-10T05:36:06Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.