Related papers: Turning the Tide: Repository-based Code Reflection

Turning the Tide: Repository-based Code Reflection

URL: http://arxiv.org/abs/2507.09866v1
Date: Mon, 14 Jul 2025 02:36:27 GMT
Title: Turning the Tide: Repository-based Code Reflection
Authors: Wei Zhang, Jian Yang, Jiaxi Yang, Ya Wang, Zhoujun Li, Zeyu Cui, Binyuan Hui, Junyang Lin,
Abstract summary: We introduce LiveRepoReflection, a benchmark for evaluating code understanding and generation in multi-file repository contexts.<n>1,888 rigorously filtered test cases across $6$ programming languages to ensure diversity, correctness, and high difficulty.<n>We also create RepoReflection-Instruct, a large-scale, quality-filtered instruction-tuning dataset derived from diverse sources.
Score: 52.13709676656648
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Code large language models (LLMs) enhance programming by understanding and generating code across languages, offering intelligent feedback, bug detection, and code updates through reflection, improving development efficiency and accessibility. While benchmarks (e.g. HumanEval/LiveCodeBench) evaluate code generation and real-world relevance, previous works ignore the scenario of modifying code in repositories. Considering challenges remaining in improving reflection capabilities and avoiding data contamination in dynamic benchmarks, we introduce LiveRepoReflection, a challenging benchmark for evaluating code understanding and generation in multi-file repository contexts, featuring 1,888 rigorously filtered test cases across $6$ programming languages to ensure diversity, correctness, and high difficulty. Further, we create RepoReflection-Instruct, a large-scale, quality-filtered instruction-tuning dataset derived from diverse sources, used to train RepoReflectionCoder through a two-turn dialogue process involving code generation and error-driven repair. The leaderboard evaluates over 40 LLMs to reflect the model performance of repository-based code reflection.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
Knowledge Graph Based Repository-Level Code Generation [0.0]
This paper introduces a novel knowledge graph-based approach to improve code search and retrieval.<n>The proposed approach represents code repositories as graphs, capturing structural and relational information for enhanced context-aware code generation.<n>We benchmark the proposed approach on the Evolutionary Code Benchmark dataset, a repository-level code generation benchmark, and demonstrate that our method significantly outperforms the baseline approach.
arXiv Detail & Related papers (2025-05-20T14:13:59Z)
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation [13.75248879205993]
We propose Adaptive Critique Refinement (ACR), which enables the model to refine itself by self-generated code and external critique.<n>ACR includes a composite scoring system with LLM-as-a-Judge to evaluate the quality of code responses.<n>We develop the RefineCoder series by iteratively applying ACR, achieving continuous performance improvement on multiple code generation benchmarks.
arXiv Detail & Related papers (2025-02-13T11:17:53Z)
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters. Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework. Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z)
RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation [7.167252304211614]
RepoGenReflex is a generic, dynamic, effective framework to optimize the retrieval and generation process dynamically. By leveraging the Retrieval-Augmented Generation (RAG) enhanced with Verbal Reinforcement Learning (VRL), it can dynamically choose the optimal results for repository-level code completion. RepoGenReflex consistently demonstrates superior performance and effectiveness across standard code completion tasks.
arXiv Detail & Related papers (2024-09-19T23:38:59Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z)
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation [39.778073569406175]
We present ReflectionCoder, a novel approach to improve one-off code generation performance.<n>We propose reflection self-distillation and dynamically masked distillation to effectively utilize reflection sequences.<n>Our experiments demonstrate that models fine-tuned with our method achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-05-27T11:27:00Z)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process. It incorporates a similarity-based retriever and a pre-trained code language model. It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.