Fugu-MT 論文翻訳(概要): GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion

論文の概要: GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion

arxiv url: http://arxiv.org/abs/2601.23254v1
Date: Fri, 30 Jan 2026 18:22:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.609068
Title: GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion
Title（参考訳）: GrepRAG: コード補完のためのGrepライクな検索の実証的研究と最適化
Authors: Baoyi Wang, Xingliang Wang, Guochang Li, Chen Zhi, Junxiao Han, Xinkui Zhao, Nan Wang, Shuiguang Deng, Jianwei Yin,
Abstract要約: リポジトリレベルのコード補完は、大きな言語モデルでは依然として困難である。本稿では,軽量でインデックスなし,意図認識型語彙検索について検討する。本稿では,LLMが関連するコンテキストを自動生成するベースラインフレームワークであるNaive GrepRAGを紹介する。
参考スコア（独自算出の注目度）: 32.17127975368661
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Repository-level code completion remains challenging for large language models (LLMs) due to cross-file dependencies and limited context windows. Prior work addresses this challenge using Retrieval-Augmented Generation (RAG) frameworks based on semantic indexing or structure-aware graph analysis, but these approaches incur substantial computational overhead for index construction and maintenance. Motivated by common developer workflows that rely on lightweight search utilities (e.g., ripgrep), we revisit a fundamental yet underexplored question: how far can simple, index-free lexical retrieval support repository-level code completion before more complex retrieval mechanisms become necessary? To answer this question, we systematically investigate lightweight, index-free, intent-aware lexical retrieval through extensive empirical analysis. We first introduce Naive GrepRAG, a baseline framework in which LLMs autonomously generate ripgrep commands to retrieve relevant context. Despite its simplicity, Naive GrepRAG achieves performance comparable to sophisticated graph-based baselines. Further analysis shows that its effectiveness stems from retrieving lexically precise code fragments that are spatially closer to the completion site. We also identify key limitations of lexical retrieval, including sensitivity to noisy matches from high-frequency ambiguous keywords and context fragmentation caused by rigid truncation boundaries. To address these issues, we propose GrepRAG, which augments lexical retrieval with a lightweight post-processing pipeline featuring identifier-weighted re-ranking and structure-aware deduplication. Extensive evaluation on CrossCodeEval and RepoEval-Updated demonstrates that GrepRAG consistently outperforms state-of-the-art (SOTA) methods, achieving 7.04-15.58 percent relative improvement in code exact match (EM) over the best baseline on CrossCodeEval.
Abstract（参考訳）: リポジトリレベルのコード補完は、ファイル間の依存関係と限られたコンテキストウィンドウのため、大きな言語モデル(LLM)では依然として困難である。従来の作業では,セマンティックインデクシングや構造認識グラフ解析に基づくRAG(Retrieval-Augmented Generation)フレームワークを使用してこの問題に対処するが,これらの手法は,インデックスの構築とメンテナンスにおいてかなりの計算オーバーヘッドを発生させる。ライトウェイトな検索ユーティリティ(例:ripgrep)に依存している一般的な開発者ワークフローに触発された私たちは、根本的な未調査の質問を再考する。より複雑な検索メカニズムが必要とされるようになる前に、どのくらい、単純でインデックスなしの語彙検索がリポジトリレベルのコード補完をサポートすることができるのか? この疑問に対処するために、我々は、広範囲な経験分析を通して、軽量でインデックスなし、意図認識の語彙検索を体系的に研究した。まず,LLMが関連するコンテキストを自動生成するベースラインフレームワークであるNaive GrepRAGを紹介する。単純さにもかかわらず、Naive GrepRAGは洗練されたグラフベースのベースラインに匹敵するパフォーマンスを実現している。さらなる分析により、その効果は、完了地点に空間的に近い語彙的に正確なコード断片を取得することに起因することが示されている。また, 高頻度曖昧なキーワードからの雑音に敏感な一致や, 厳密な絡み合いによる文脈の断片化など, 語彙検索の重要な限界も同定した。これらの問題に対処するため,GrepRAGを提案する。このGrepRAGは,識別子の重み付けと構造認識の重複を特徴とする,軽量な後処理パイプラインで語彙検索を強化する。 CrossCodeEvalとRepoEval-Updatedの大規模な評価は、GrepRAGが常に最先端(SOTA)メソッドより優れており、CrossCodeEvalの最良のベースラインよりも7.04-15.58パーセントのコード完全一致(EM)が相対的に改善されていることを示している。

論文の概要: GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion

関連論文リスト