Fugu-MT 論文翻訳(概要): OSCAR: Orchestrated Self-verification and Cross-path Refinement

論文の概要: OSCAR: Orchestrated Self-verification and Cross-path Refinement

arxiv url: http://arxiv.org/abs/2604.01624v1
Date: Thu, 02 Apr 2026 05:02:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.3678
Title: OSCAR: Orchestrated Self-verification and Cross-path Refinement
Title（参考訳）: OSCAR: 自己組織化とクロスパスリファインメントのオーケストレーション
Authors: Yash Shah, Abhijit Chakraborty, Naresh Kumar Devulapally, Vishnu Lokhande, Vivek Gupta,
Abstract要約: 拡散言語モデルは、推論時間制御のための自然なハンドラを提供する、発音軌道を公開する。我々はコミットメントの不確実性ローカライゼーションを定式化し、デノベーション軌道が与えられた場合、クロスチェーンエントロピーが教師なししきい値を超えるトークンの位置を特定する。我々はこの定式化を運用するトレーニング不要な推論時間フレームワークOSCARを紹介する。
参考スコア（独自算出の注目度）: 7.202012136912518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion language models (DLMs) expose their denoising trajectories, offering a natural handle for inference-time control; accordingly, an ideal hallucination mitigation framework should intervene during generation using this model-native signal rather than relying on an externally trained hallucination classifier. Toward this, we formulate commitment uncertainty localization: given a denoising trajectory, identify token positions whose cross-chain entropy exceeds an unsupervised threshold before factually unreliable commitments propagate into self-consistent but incorrect outputs. We introduce a suite of trajectory-level assessments, including a cross-chain divergence-at-hallucination (CDH) metric, for principled comparison of localization methods. We also introduce OSCAR, a training-free inference-time framework operationalizing this formulation. OSCAR runs N parallel denoising chains with randomized reveal orders, computes cross-chain Shannon entropy to detect high-uncertainty positions, and then performs targeted remasking conditioned on retrieved evidence. Ablations confirm that localization and correction contribute complementary gains, robust across N in {4, 8, 16}. On TriviaQA, HotpotQA, RAGTruth, and CommonsenseQA using LLaDA-8B and Dream-7B, OSCAR enhances generation quality by significantly reducing hallucinated content and improving factual accuracy through uncertainty-guided remasking, which also facilitates more effective integration of retrieved evidence. Its native entropy-based uncertainty signal surpasses that of specialized trained detectors, highlighting an inherent capacity of diffusion language models to identify factual uncertainty that is not present in the sequential token commitment structure of autoregressive models. We are releasing the codebase1 to support future research on localization and uncertainty-aware generation in DLMs.
Abstract（参考訳）: 拡散言語モデル(DLMs)は、推論時間制御の自然なハンドルを提供することによって、その認知的軌跡を公開している。この目的のために、コミットメントの不確実性ローカライゼーションを定式化し、デノベーション軌跡を与えられた場合、実際に信頼できないコミットメントが自己整合的であるが誤った出力に伝播する前に、クロスチェーンエントロピーが教師なししきい値を超えるトークンの位置を特定する。本稿では,局所化手法の原理的比較のために,CDH(クロスチェイン・ダイバージェンス・アット・ハロシン化)計量を含む軌道レベルの評価スイートを紹介する。また、この定式化を運用するトレーニング不要な推論時間フレームワークであるOSCARについても紹介する。 OSCARは無作為な公開順序でN個の並列denoising chainを実行し、クロスチェーンのShannonエントロピーを計算し、高い不確実性位置を検出し、取得されたエビデンスに基づいてターゲットリマッシングを実行する。アブレーションは、局所化と補正が N をまたいだ {4, 8, 16} における補的ゲインに寄与することを確認する。 LLaDA-8BとDream-7Bを用いたTriviaQA,HotpotQA,RAGTruth,CommonsenseQAでは,幻覚コンテンツを大幅に削減し,不確実性誘導による事実精度の向上によって生成品質を向上させるとともに,得られた証拠のより効果的な統合を促進する。固有エントロピーに基づく不確実性信号は、専門的な訓練された検出器を超越し、自己回帰モデルのシーケンシャルトークンコミットメント構造に存在しない事実不確実性を特定するために拡散言語モデルの固有の能力を強調している。 DLMにおけるローカライゼーションと不確実性を考慮した生成に関する今後の研究を支援するために、コードベース1をリリースする。

論文の概要: OSCAR: Orchestrated Self-verification and Cross-path Refinement

関連論文リスト