Fugu-MT 論文翻訳(概要): Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

論文の概要: Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

arxiv url: http://arxiv.org/abs/2604.27850v1
Date: Thu, 30 Apr 2026 13:33:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:54.11072
Title: Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems
Title（参考訳）: オブジェクト記述に対する推論はタスクベース対話システムにおける参照解決を改善する
Authors: Oier Ijurco, Oier Lopez de Lacalle,
Abstract要約: タスクベースの対話システムは、アクションの実行や情報の検索など、ユーザが特定の目標を達成するのを支援する。対話の中でオブジェクト参照を特定することを含むため、正確なコア参照解決は不可欠である。本研究では,大規模言語モデルによる詳細なオブジェクトメタデータと対話履歴の推論を可能にする一元的テスト時推論手法を提案する。
参考スコア（独自算出の注目度）: 4.617917983223879
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Task-based dialogue systems assist users in achieving specific goals, such as executing actions or retrieving information, through natural language interactions. Accurate coreference resolution is essential, as it involves identifying object references within the dialogue - a task that becomes increasingly challenging in visually grounded environments characterized by complex scenes and diverse object metadata. However, coreference resolution in task-based dialogue remains limited by poor generalization across domains and heavy reliance on supervised models that often overfit to dataset-specific artifacts. In this work, we propose a unimodal test-time reasoning approach that enables large language models (LLMs) to reason over detailed object metadata and dialogue history to improve coreference resolution. Empirical results on the SIMMC 2.1 dataset demonstrate that LLMs can generate step-by-step reasoning processes that effectively align dialogue context with objects present in the scene. Extensive experiments highlight the models' ability to link conversations and objects accurately. Moreover, we show that test-time reasoning under few-shot settings generalizes effectively to unseen scenarios and novel objects, outperforming encoder-based supervised methods in cross-domain evaluations. These findings underscore the critical role of structured metadata and careful prompt engineering in enhancing the robustness and generalization of task-oriented dialogue systems.
Abstract（参考訳）: タスクベースの対話システムは、自然言語による対話を通じて、アクションの実行や情報検索などの特定の目標を達成するのを支援する。複雑なシーンと多様なオブジェクトメタデータによって特徴づけられる視覚的に基盤付けられた環境において、ますます困難なタスクである対話の中でオブジェクト参照を特定することを含むため、正確なコア参照解決は不可欠である。しかし、タスクベースの対話におけるコア参照の解決は、ドメイン間の一般化の貧弱さと、しばしばデータセット固有のアーティファクトに過度に適合する教師付きモデルに大きく依存することによって制限されている。そこで本研究では,大規模言語モデル(LLM)がオブジェクトの詳細なメタデータや対話履歴を解析し,コア参照解決を改善するための一元的テスト時推論手法を提案する。 SIMMC 2.1データセットの実証的な結果は、LLMがステップバイステップの推論プロセスを生成し、シーンに存在するオブジェクトと対話コンテキストを効果的に整合させることを示した。大規模な実験では、モデルが会話とオブジェクトを正確にリンクする能力を強調している。さらに,数ショット設定によるテスト時間推論は,未知のシナリオや新しいオブジェクトを効果的に一般化し,ドメイン間評価においてエンコーダに基づく教師付き手法よりも優れていることを示す。これらの知見は、タスク指向対話システムの堅牢性と一般化を強化する上で、構造化メタデータと慎重なプロンプトエンジニアリングの重要な役割を浮き彫りにした。

論文の概要: Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

関連論文リスト