Fugu-MT 論文翻訳(概要): Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics

論文の概要: Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics

arxiv url: http://arxiv.org/abs/2508.08661v1
Date: Tue, 12 Aug 2025 05:59:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-13 21:07:34.320733
Title: Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics
Title（参考訳）: 自然言語生成へのコード変更における幻覚--検出指標の有病率と評価
Authors: Chunhua Liu, Hong Yi Lin, Patanamon Thongtanunam,
Abstract要約: 幻覚は、自然言語とコード生成において独立して研究されている。幻覚は、自然言語生成へのコード変更を含む2つの重要なタスク、コミットメッセージ生成とコードレビューコメント生成に発生する。近年の言語モデルにおける幻覚の有病率の定量化と,それを自動的に検出するためのメトリクスベースのアプローチの探索を行う。
参考スコア（独自算出の注目度）: 2.990411348977783
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Language models have shown strong capabilities across a wide range of tasks in software engineering, such as code generation, yet they suffer from hallucinations. While hallucinations have been studied independently in natural language and code generation, their occurrence in tasks involving code changes which have a structurally complex and context-dependent format of code remains largely unexplored. This paper presents the first comprehensive analysis of hallucinations in two critical tasks involving code change to natural language generation: commit message generation and code review comment generation. We quantify the prevalence of hallucinations in recent language models and explore a range of metric-based approaches to automatically detect them. Our findings reveal that approximately 50\% of generated code reviews and 20\% of generated commit messages contain hallucinations. Whilst commonly used metrics are weak detectors on their own, combining multiple metrics substantially improves performance. Notably, model confidence and feature attribution metrics effectively contribute to hallucination detection, showing promise for inference-time detection.\footnote{All code and data will be released upon acceptance.
Abstract（参考訳）: 言語モデルは、コード生成のようなソフトウェア工学の幅広いタスクにおいて強力な能力を示してきたが、幻覚に悩まされている。幻覚は自然言語とコード生成において独立して研究されてきたが、構造的に複雑で文脈に依存したコード形式を持つコード変更に関わるタスクは、いまだほとんど探索されていない。本稿では,コード変更による自然言語生成に関わる2つの重要なタスク,すなわちコミットメッセージ生成とコードレビューコメント生成における幻覚の包括的分析について述べる。近年の言語モデルにおける幻覚の有病率の定量化と,それを自動的に検出するためのメトリクスベースのアプローチの探索を行う。その結果,生成したコードレビューの約50%,生成したコミットメッセージの約20%に幻覚が含まれていることがわかった。一般的に使用されるメトリクスは弱い検出器であるが、複数のメトリクスを組み合わせることで性能が大幅に向上する。特に、モデル信頼度と特徴帰属度は幻覚検出に効果的に寄与し、推測時間検出の約束を示す。 \footnote{All code and data will release by accept

論文の概要: Hallucinations in Code Change to Natural Language Generation: Prevalence and Evaluation of Detection Metrics

関連論文リスト