Fugu-MT 論文翻訳(概要): SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

論文の概要: SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

arxiv url: http://arxiv.org/abs/2605.16650v1
Date: Fri, 15 May 2026 21:39:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:46.894796
Title: SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs
Title（参考訳）: SKG-Eval:インクリメンタルセマンティック知識グラフによる多言語対話のステートフル評価
Authors: Avijit Shil, Suman Samui,
Abstract要約: SKG-Evalは、対話を進化するセマンティック知識グラフとしてモデル化する準決定論的かつ解釈可能なフレームワークである。本研究では,SKG-Evalが人間の判断と高い相関を達成し,会話における長距離不整合の検出を大幅に改善することを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating multi-turn dialogue systems remains challenging because response quality depends not only on the current prompt, but also on previously established entities, claims, and conversational commitments. Existing automatic evaluators, including LLM-as-a-judge frameworks and embedding-based metrics, largely rely on flat or turn-isolated representations, making them less effective at detecting long-range issues such as contradiction, topic drift, and entity inconsistency. To address this, we propose SKG-Eval, a quasi-deterministic and interpretable framework that models dialogue as an evolving Semantic Knowledge Graph (SKG) of entities, relations, and commitments across turns. The framework incrementally updates the graph through structured triple extraction and computes three complementary signals: (i) local relevance, measuring alignment with the current prompt and optional reference; (ii) historical consistency, evaluating how newly introduced information connects to prior conversational context using graph-based and embedding-driven signals; and (iii) logical coherence, assessed by a geometric contradiction engine that detects cross-turn conflicts without relying on NLI models or LLM judges. These signals are adaptively fused and aggregated into a length-invariant session score via recency-weighted trend analysis. Across multiple benchmarks, SKG-Eval achieves higher correlation with human judgments and substantially improves detection of long-range inconsistencies in extended conversations. In addition, the framework produces explicit contradiction certificates and deterministic scores for fixed inputs, enabling reproducible and auditable evaluation. Overall, our results suggest that structured externalized state tracking through semantic knowledge graphs provides a scalable alternative to implicit reasoning in LLM-based dialogue evaluators.
Abstract（参考訳）: 応答品質は現在のプロンプトだけでなく、以前に確立されたエンティティ、クレーム、会話のコミットメントにも依存するため、マルチターン対話システムの評価は依然として困難である。 LLM-as-a-judgeフレームワークや埋め込みベースのメトリクスを含む既存の自動評価器は、主にフラットあるいはターンアイソレーションされた表現に依存しており、矛盾やトピックのドリフト、エンティティの不整合といった長距離問題の検出に効果が低い。そこで我々は,SKG-Evalを提案する。SKG-Evalは準決定論的かつ解釈可能なフレームワークで,対話を,各ターンにまたがるエンティティ,関係,コミットメントの進化的セマンティック知識グラフ(SKG)としてモデル化する。このフレームワークは、構造化三重抽出を通じてグラフを漸進的に更新し、3つの補完信号を計算する。 i) 局所的関連性,現行のプロンプト及びオプション参照との整合性の測定 (二グラフベース及び埋め込み駆動信号を用いて、新たに導入した情報が先行会話状況とどのように結びつくかを評価する歴史整合性三論理コヒーレンス幾何矛盾エンジンにより評価され、NLIモデルやLLMの判断に頼ることなく、交互衝突を検出する。これらの信号は適応的に融合され、電流重み付けトレンド分析により長不変セッションスコアに集約される。複数のベンチマークにおいて、SKG-Evalは人間の判断と高い相関を達成し、会話の拡張における長距離不整合の検出を大幅に改善する。さらに、このフレームワークは、固定された入力に対する明確な矛盾証明書と決定論的スコアを生成し、再現可能で監査可能な評価を可能にする。以上の結果から,意味知識グラフを用いた構造化外部状態追跡は,LLMに基づく対話評価器における暗黙的推論に代わるスケーラブルな代替手段である可能性が示唆された。

論文の概要: SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

関連論文リスト