Related papers: Meta-RAG on Large Codebases Using Code Summarization

Meta-RAG on Large Codebases Using Code Summarization

URL: http://arxiv.org/abs/2508.02611v1
Date: Mon, 04 Aug 2025 17:01:10 GMT
Title: Meta-RAG on Large Codebases Using Code Summarization
Authors: Vali Tawosia, Salwa Alamir, Xiaomo Liu, Manuela Veloso,
Abstract summary: Large Language Model (LLM) systems have been at the forefront of applied Artificial Intelligence (AI) research in a multitude of domains.<n>We propose a multi-agent system to localize bugs in large pre-existings using information retrieval and LLMs.<n>Our system introduces a novel Retrieval Augmented Generation (RAG) approach, Meta-RAG, where we utilize summaries to condenses by an average of 79.8%, into a compact, structured, natural language representation.
Score: 11.415083231118142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model (LLM) systems have been at the forefront of applied Artificial Intelligence (AI) research in a multitude of domains. One such domain is software development, where researchers have pushed the automation of a number of code tasks through LLM agents. Software development is a complex ecosystem, that stretches far beyond code implementation and well into the realm of code maintenance. In this paper, we propose a multi-agent system to localize bugs in large pre-existing codebases using information retrieval and LLMs. Our system introduces a novel Retrieval Augmented Generation (RAG) approach, Meta-RAG, where we utilize summaries to condense codebases by an average of 79.8\%, into a compact, structured, natural language representation. We then use an LLM agent to determine which parts of the codebase are critical for bug resolution, i.e. bug localization. We demonstrate the usefulness of Meta-RAG through evaluation with the SWE-bench Lite dataset. Meta-RAG scores 84.67 % and 53.0 % for file-level and function-level correct localization rates, respectively, achieving state-of-the-art performance.

Related papers

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery [5.326072982491534]
cmbagent is a system for automation of scientific research tasks.<n>The system is formed by about 30 Large Language Model (LLM) agents.<n>The system is deployed on HuggingFace and will be available on the cloud.
arXiv Detail & Related papers (2025-07-09T20:03:30Z)
Context-Aware Code Wiring Recommendation with LLM-based Agent [4.34559879087055]
Code wiring involves substituting unresolved variables in pasted code with suitable ones from surrounding context.<n>We introduce WIRL, an agent for code wiring framed as a Retrieval-Augmented Generation (RAG) infilling task.<n>We evaluate WIRL on a carefully curated, high-quality dataset consisting of real-world code adaptation scenarios.
arXiv Detail & Related papers (2025-07-02T03:00:23Z)
SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z)
CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation [69.684886175768]
Large language models (LLMs) have shown promising performance in automated code generation.<n>In this paper, we propose CodeRAG, a retrieval-augmented code generation framework.<n> Experiments show that CodeRAG achieves significant improvements compared to no RAG scenarios.
arXiv Detail & Related papers (2025-04-14T09:51:23Z)
LocAgent: Graph-Guided LLM Agents for Code Localization [25.395102705800916]
LocAgent is a framework that addresses code localization through graph-based representation.<n>Our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost.
arXiv Detail & Related papers (2025-03-12T05:55:01Z)
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously. We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process. Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z)
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation [48.11754113512047]
This study includes a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains. Our pipeline works in a fully automated manner, enabling a push-bottom construction from code repositories into formatted subjects under study. The contributions of this study include a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains, a fully automated pipeline for constructing code benchmarks, and an identification of the limitations of LLMs in code generation tasks based on their performance on DOMAINEVAL.
arXiv Detail & Related papers (2024-08-23T16:33:58Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.