Fugu-MT 論文翻訳(概要): CoHalLo: code hallucination localization via probing hidden layer vector

論文の概要: CoHalLo: code hallucination localization via probing hidden layer vector

arxiv url: http://arxiv.org/abs/2512.24183v1
Date: Tue, 30 Dec 2025 12:36:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-01 23:27:28.378973
Title: CoHalLo: code hallucination localization via probing hidden layer vector
Title（参考訳）: CoHalLo:隠れ層ベクトルの探索によるコード幻覚局在化
Authors: Nan Jia, Wangchao Sang, Pengfei Lin, Xiangping Chen, Yuan Huang, Yi Liu, Mingliang Li,
Abstract要約: 本研究では,幻覚検出モデルから隠れ層ベクトルを探索するCoHalLoという新しい手法を提案する。 CoHalLoは、モデルの幻覚判断を駆動する重要な構文情報を発見し、それに従って幻覚コード行を特定する。実験の結果,CoHalLoは0.4253のTop-1精度,0.6149のTop-3精度,0.7356のTop-5精度,0.8333のTop-10精度,5.73のIFA,0.052721のRecall@1%,0.155269のEffort@20%リコールを実現し,ベースライン法を上回った。
参考スコア（独自算出の注目度）: 8.468456925593072
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The localization of code hallucinations aims to identify specific lines of code containing hallucinations, helping developers to improve the reliability of AI-generated code more efficiently. Although recent studies have adopted several methods to detect code hallucination, most of these approaches remain limited to coarse-grained detection and lack specialized techniques for fine-grained hallucination localization. This study introduces a novel method, called CoHalLo, which achieves line-level code hallucination localization by probing the hidden-layer vectors from hallucination detection models. CoHalLo uncovers the key syntactic information driving the model's hallucination judgments and locates the hallucinating code lines accordingly. Specifically, we first fine-tune the hallucination detection model on manually annotated datasets to ensure that it learns features pertinent to code syntactic information. Subsequently, we designed a probe network that projects high-dimensional latent vectors onto a low-dimensional syntactic subspace, generating vector tuples and reconstructing the predicted abstract syntax tree (P-AST). By comparing P-AST with the original abstract syntax tree (O-AST) extracted from the input AI-generated code, we identify the key syntactic structures associated with hallucinations. This information is then used to pinpoint hallucinated code lines. To evaluate CoHalLo's performance, we manually collected a dataset of code hallucinations. The experimental results show that CoHalLo achieves a Top-1 accuracy of 0.4253, Top-3 accuracy of 0.6149, Top-5 accuracy of 0.7356, Top-10 accuracy of 0.8333, IFA of 5.73, Recall@1% Effort of 0.052721, and Effort@20% Recall of 0.155269, which outperforms the baseline methods.
Abstract（参考訳）: コード幻覚のローカライゼーションは、幻覚を含む特定のコード行を特定することを目的としており、開発者はAI生成コードの信頼性をより効率的に向上するのに役立つ。近年の研究では、コードの幻覚を検出するいくつかの方法が採用されているが、これらの手法の多くは、粗粒度の検出に限られており、より微細な幻覚の局所化のための特殊な技術が欠如している。本研究では,隠れ層ベクトルを幻覚検出モデルから探索することにより,行レベルのコード幻覚位置を求める手法であるCoHalLoを提案する。 CoHalLoは、モデルの幻覚判断を駆動する重要な構文情報を発見し、それに従って幻覚コード行を特定する。具体的には、まず手動で注釈付きデータセット上で幻覚検出モデルを微調整し、コード構文情報に関連する特徴を確実に学習する。その後、高次元の潜在ベクトルを低次元の構文部分空間に投影し、ベクトルタプルを生成し、予測された抽象構文木(P-AST)を再構成するプローブネットワークを設計した。入力AI生成コードから抽出したP-ASTとオリジナルの抽象構文木(O-AST)を比較し,幻覚に関連する重要な構文構造を同定する。この情報は、幻覚化されたコード行をピンポイントするために使われる。 CoHalLoの性能を評価するために,コード幻覚のデータセットを手作業で収集した。実験の結果,CoHalLoはTop-1精度0.4253,Top-3精度0.6149,Top-5精度0.7356,Top-10精度0.8333,IFA5.73,Recall@1%エフォート0.052721,Effort@20%リコール0.155269を達成し,ベースライン法より優れていた。

論文の概要: CoHalLo: code hallucination localization via probing hidden layer vector

関連論文リスト