Fugu-MT 論文翻訳(概要): Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue

論文の概要: Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue

arxiv url: http://arxiv.org/abs/2604.23345v1
Date: Sat, 25 Apr 2026 15:07:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.28494
Title: Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue
Title（参考訳）: ブリジング推論とアクション:効率的なクロスドメインタスク指向対話のためのハイブリッドLLM-RLフレームワーク
Authors: Yangyang Zhao, Linfan Dai, Li Cai, Bowen Xing, Libo Qin,
Abstract要約: 大きな言語モデル(LLM)は制約を推測できるが、長い地平線上では信頼性が低い。強化学習 (Reinforcement Learning, RL) は, 自然な対話から制約を回復することができない長時間水平動作を最適化する。本稿では,LLM由来制約推論をRLに用いるハイブリッドフレームワークであるVLK-RL(Verified LLM-Knowledge empowered RL)を提案する。
参考スコア（独自算出の注目度）: 23.90869525503871
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-domain task-oriented dialogue requires reasoning over implicit and explicit feasibility constraints while planning long-horizon, multi-turn actions. Large language models (LLMs) can infer such constraints but are unreliable over long horizons, while Reinforcement learning (RL) optimizes long-horizon behavior yet cannot recover constraints from raw dialogue. Naively coupling LLMs with RL is therefore brittle: unverified or unstructured LLM outputs can corrupt state representations and misguide policy learning. Motivated by this, we propose Verified LLM-Knowledge empowered RL (VLK-RL), a hybrid framework that makes LLM-derived constraint reasoning usable for RL. VLK-RL first elicits candidate constraints with an LLM and then verifies them via a dual-role cross-examination procedure to suppress hallucinations and cross-turn inconsistencies. The verified constraints are mapped into ontology-aligned slot-value representations, yielding a structured, constraint-aware state for RL policy optimization. Experiments across multiple benchmarks demonstrate that VLK-RL significantly improves generalization and robustness, outperforming strong single-model baselines on long-horizon tasks.
Abstract（参考訳）: ドメイン間タスク指向の対話では、長期にわたるマルチターンアクションを計画している間、暗黙的かつ明示的な実行可能性制約を推論する必要がある。大規模言語モデル(LLM)はそのような制約を推測できるが、長い水平線上では信頼性が低い。したがって、LLMとRLをネイティブに結合することは不安定であり、未検証または未構造化のLLM出力は状態表現や誤った政策学習を損なう可能性がある。そこで我々は,LLMに基づく制約推論をRLに用いるハイブリッドフレームワークであるVerified LLM-Knowledge empowered RL (VLK-RL)を提案する。 VLK-RL はまず LLM に候補制約を課し、その後、幻覚や交叉不整合を抑えるために二重ロールの相互検査手順によって検証する。検証された制約はオントロジーに整合したスロット値表現にマッピングされ、RLポリシー最適化のための構造化された制約対応状態が得られる。複数のベンチマークによる実験により、VLK-RLは一般化とロバスト性を大幅に向上し、長い水平タスクにおいて強い単一モデルベースラインを上回ります。

論文の概要: Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue

関連論文リスト