Fugu-MT 論文翻訳(概要): Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval

論文の概要: Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval

arxiv url: http://arxiv.org/abs/2605.00447v1
Date: Fri, 01 May 2026 06:34:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.876463
Title: Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval
Title（参考訳）: 選択肢を見落とさないように - LLM支援検索とイシューコミットリンクを再考
Authors: Cole Morgan, Muhammad Asaduzzaman, Shaiful Chowdhurry, Shaowei Wang,
Abstract要約: 本稿では,BTLink,EasyLink,FRLink,RCLinker,Hybrid-Linkerなど,既定のイシューコミットリンクリカバリ手法について検討する。その結果, 密集検索手法は, 関連するコミットの特定において, スパース検索手法よりも優れていることがわかった。従来の機械学習ベースのリグレード技術は、LLMベースのアプローチよりも高いパフォーマンスを実現する。
参考スコア（独自算出の注目度）: 7.078973963849209
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Linking issue reports to the commits that resolve them is essential for software traceability, maintenance, and evolution. Accurate issue-commit links help developers to understand system changes and the rationale behind them. While numerous automated techniques have been proposed, ranging from heuristic and feature-based approaches to modern deep learning and large language model approaches, our goal is to evaluate these techniques to determine which are most effective and efficient. In this study, we revisit several established issue-commit link recovery techniques, including BTLink, EasyLink, FRLink, RCLinker, and Hybrid-Linker, and assess their performance for reranking issue-commit links. We first evaluate different retrieval methods (BM25, BM25L, SBERT-Semantic Search, ANNOY, LSH, HNSW) for their ability to efficiently retrieve relevant commits, reducing the candidate set that must be considered by more computationally expensive models. Using the best retrieval methods, we then investigate the reranking effectiveness of different machine learning-based techniques, including traditional machine learning models, a cross-encoder, and large language models (ChatGPT, Qwen, Gemma, Llama), to refine the reranking of candidate commits and improve precision. Finally, we compare the effectiveness of these techniques. Our results show that dense retrieval methods outperform sparse retrieval approaches in identifying relevant commits and that combining dense and sparse retrieval can improve recall. Additionally, we find that traditional machine learning-based reranking techniques achieve higher performance than LLM-based approaches. Our results highlight that retrieval-based pipelines remain a practical and effective solution for large-scale issue-commit linking, and that simpler models should be carefully considered before adopting computationally expensive LLM-based approaches.
Abstract（参考訳）: 問題の報告をそれらを解決するコミットにリンクすることは、ソフトウェアのトレーサビリティ、メンテナンス、進化に不可欠である。正確なイシューコミットリンクは、開発者がシステムの変更とそれらの背後にある理論的根拠を理解するのに役立つ。ヒューリスティックな特徴に基づくアプローチから、現代のディープラーニングや大規模言語モデルアプローチまで、数多くの自動化技術が提案されているが、その目標は、どの手法が最も効果的で効率的なかを判断することである。本研究では,BTLink,EasyLink,FRLinker,RCLinker,Hybrid-Linkerなどの既存の課題コミットリンク回復手法を再検討し,課題コミットリンクの再評価を行う。まず,より計算コストの高いモデルで考慮すべき候補セットを減らし,複数の検索手法(BM25,BM25L,SBERT-Semantic Search,ANNOY,LSH,HNSW)を効率よく検索する能力について検討した。最適な検索手法を用いて,従来の機械学習モデル,クロスエンコーダ,大規模言語モデル(ChatGPT,Qwen,Gemma,Llama)など,さまざまな機械学習ベースの手法の順位変更の有効性を検証し,候補コミットの再ランク付けを洗練し,精度の向上を図る。最後に,これらの手法の有効性を比較した。以上の結果から, 高密度検索手法は, 関連するコミットの特定においてスパース検索手法よりも優れており, 高密度検索法とスパース検索法を組み合わせることでリコールを改善できることが示唆された。さらに,従来の機械学習に基づくリグレード技術は,LCMベースの手法よりも高い性能を実現することがわかった。この結果から,検索ベースパイプラインは大規模課題コミットリンクの実用的で効果的なソリューションであり,計算コストの高いLCMアプローチを採用する前に,より単純なモデルを慎重に検討すべきであることが示唆された。

論文の概要: Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval

関連論文リスト