Fugu-MT 論文翻訳(概要): Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection

論文の概要: Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection

arxiv url: http://arxiv.org/abs/2509.05360v1
Date: Wed, 03 Sep 2025 18:52:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.459071
Title: Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection
Title（参考訳）: ROUGEを超える: LLM幻覚検出のためのN-Gram部分空間特徴
Authors: Jerry Li, Evangelos Papalexakis,
Abstract要約: 大規模言語モデル(LLM)は、自然言語を含む様々なタスクにおいて有効性を示す。幻覚の根本的な問題は依然としてこれらのモデルに悩まされており、一貫性のある真正な情報を生成する際の信頼性を制限している。 LLM生成テキストからN-Gram周波数テンソルを構成するROUGEにインスパイアされた新しい手法を提案する。このテンソルは共起パターンを符号化することでよりリッチな意味構造を捉え、事実と幻覚的コンテンツをよりよく区別することができる。
参考スコア（独自算出の注目度）: 5.0106565473767075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated effectiveness across a wide variety of tasks involving natural language, however, a fundamental problem of hallucinations still plagues these models, limiting their trustworthiness in generating consistent, truthful information. Detecting hallucinations has quickly become an important topic, with various methods such as uncertainty estimation, LLM Judges, retrieval augmented generation (RAG), and consistency checks showing promise. Many of these methods build upon foundational metrics, such as ROUGE, BERTScore, or Perplexity, which often lack the semantic depth necessary to detect hallucinations effectively. In this work, we propose a novel approach inspired by ROUGE that constructs an N-Gram frequency tensor from LLM-generated text. This tensor captures richer semantic structure by encoding co-occurrence patterns, enabling better differentiation between factual and hallucinated content. We demonstrate this by applying tensor decomposition methods to extract singular values from each mode and use these as input features to train a multi-layer perceptron (MLP) binary classifier for hallucinations. Our method is evaluated on the HaluEval dataset and demonstrates significant improvements over traditional baselines, as well as competitive performance against state-of-the-art LLM judges.
Abstract（参考訳）: 大規模言語モデル(LLM)は、自然言語を含む様々なタスクにおいて有効性を示しているが、幻覚の根本的な問題はこれらのモデルに悩まされ、一貫性のある真正な情報を生成する際の信頼性を制限している。幻覚の検出は、不確実性推定、LCM判断、検索拡張生成(RAG)、約束を示す一貫性チェックなどの様々な手法によって、急速に重要になっている。これらの手法の多くはROUGE、BERTScore、Perplexityといった基礎的な指標に基づいて構築されており、幻覚を効果的に検出するために必要な意味的な深さを欠いていることが多い。本研究では,LLM生成テキストからN-Gram周波数テンソルを構成するROUGEにインスパイアされた新しい手法を提案する。このテンソルは共起パターンを符号化することでよりリッチな意味構造を捉え、事実と幻覚的コンテンツをよりよく区別することができる。本研究では,各モードから特異値を抽出するためにテンソル分解法を適用し,これを入力特徴として用いて,幻覚のための多層パーセプトロン(MLP)バイナリ分類器を訓練する。提案手法はHaluEvalデータセットに基づいて評価され,従来のベースラインよりも大幅に改善されている。

論文の概要: Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection

関連論文リスト