Fugu-MT 論文翻訳(概要): Open-Vocabulary 3D Instruction Ambiguity Detection

論文の概要: Open-Vocabulary 3D Instruction Ambiguity Detection

arxiv url: http://arxiv.org/abs/2601.05991v1
Date: Fri, 09 Jan 2026 18:17:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-12 17:41:50.068045
Title: Open-Vocabulary 3D Instruction Ambiguity Detection
Title（参考訳）: Open-Vocabulary 3D Instruction Ambiguity Detection
Authors: Jiayu Ding, Haoran Tang, Ge Li,
Abstract要約: 安全クリティカルな領域では、言語的曖昧さは深刻な結果をもたらす可能性がある。ほとんどの具体的AI研究は、指示が明確で、確認よりも実行に重点を置いていると仮定して、これを見落としている。 Open-Vocabulary 3D Instruction Ambiguity Detectionを最初に定義しました。
参考スコア（独自算出の注目度）: 21.137149888707537
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In safety-critical domains, linguistic ambiguity can have severe consequences; a vague command like "Pass me the vial" in a surgical setting could lead to catastrophic errors. Yet, most embodied AI research overlooks this, assuming instructions are clear and focusing on execution rather than confirmation. To address this critical safety gap, we are the first to define Open-Vocabulary 3D Instruction Ambiguity Detection, a fundamental new task where a model must determine if a command has a single, unambiguous meaning within a given 3D scene. To support this research, we build Ambi3D, the large-scale benchmark for this task, featuring over 700 diverse 3D scenes and around 22k instructions. Our analysis reveals a surprising limitation: state-of-the-art 3D Large Language Models (LLMs) struggle to reliably determine if an instruction is ambiguous. To address this challenge, we propose AmbiVer, a two-stage framework that collects explicit visual evidence from multiple views and uses it to guide an vision-language model (VLM) in judging instruction ambiguity. Extensive experiments demonstrate the challenge of our task and the effectiveness of AmbiVer, paving the way for safer and more trustworthy embodied AI. Code and dataset available at https://jiayuding031020.github.io/ambi3d/.
Abstract（参考訳）: 安全クリティカルな領域では、言語的曖昧さは深刻な結果をもたらす可能性がある。しかし、ほとんどの具体的AI研究は、指示が明確で、確認ではなく実行に焦点を当てていると仮定して、これを見落としている。この重要な安全性のギャップに対処するために、我々は最初にOpen-Vocabulary 3D Instruction Ambiguity Detectionを定義する。この研究を支援するために、700以上の多様な3Dシーンと約22kの命令を特徴とする大規模なベンチマークであるAmbi3Dを構築した。現状の3D大言語モデル(LLM)は、命令があいまいかどうかを確実に判断するのに苦労しています。この課題に対処するため,複数の視点から明確な視覚的証拠を収集し,視覚言語モデル(VLM)を用いて指示のあいまいさを判断する2段階のフレームワークであるAmbiVerを提案する。大規模な実験は、我々のタスクの課題とAmbiVerの有効性を示し、より安全で信頼性の高いインボディードAIへの道を開いた。コードとデータセットはhttps://jiayuding031020.github.io/ambi3d/で入手できる。

論文の概要: Open-Vocabulary 3D Instruction Ambiguity Detection

関連論文リスト