Fugu-MT 論文翻訳(概要): ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation

論文の概要: ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation

arxiv url: http://arxiv.org/abs/2512.24845v1
Date: Wed, 31 Dec 2025 13:10:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.59078
Title: ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation
Title（参考訳）: ArtiSG: 人工物体操作による機能的3次元シーングラフ構築
Authors: Qiuyi Gu, Yuze Sheng, Jincheng Yu, Jiahao Tang, Xiaolong Shan, Zhaoyang Shen, Tinghao Yi, Xiaodan Liang, Xinlei Chen, Yu Wang,
Abstract要約: ArtiSGは、人間のデモを構造化されたロボットメモリにエンコードすることで、機能的な3Dシーングラフを構築するフレームワークである。本研究では,ArtiSGが機能的要素リコールおよび調音推定精度において,ベースラインを著しく上回ることを示す。
参考スコア（独自算出の注目度）: 51.54082859171464
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D scene graphs have empowered robots with semantic understanding for navigation and planning, yet they often lack the functional information required for physical manipulation, particularly regarding articulated objects. Existing approaches for inferring articulation mechanisms from static observations are prone to visual ambiguity, while methods that estimate parameters from state changes typically rely on constrained settings such as fixed cameras and unobstructed views. Furthermore, fine-grained functional elements like small handles are frequently missed by general object detectors. To bridge this gap, we present ArtiSG, a framework that constructs functional 3D scene graphs by encoding human demonstrations into structured robotic memory. Our approach leverages a robust articulation data collection pipeline utilizing a portable setup to accurately estimate 6-DoF articulation trajectories and axes even under camera ego-motion. We integrate these kinematic priors into a hierarchical and open-vocabulary graph while utilizing interaction data to discover inconspicuous functional elements missed by visual perception. Extensive real-world experiments demonstrate that ArtiSG significantly outperforms baselines in functional element recall and articulation estimation precision. Moreover, we show that the constructed graph serves as a reliable functional memory that effectively guides robots to perform language-directed manipulation tasks in real-world environments containing diverse articulated objects.
Abstract（参考訳）: 3Dシーングラフは、ナビゲーションと計画のためのセマンティックな理解を持つロボットに権限を与えている。静的観測から調音機構を推定するための既存のアプローチは、視覚的曖昧さの傾向があるが、状態変化からパラメータを推定する手法は、固定カメラや障害物のないビューのような制約された設定に依存するのが一般的である。さらに、小さなハンドルのようなきめ細かい機能要素は、一般的な物体検出器によってしばしば見逃される。このギャップを埋めるために、人間のデモを構造化されたロボットメモリにエンコードすることで、機能的な3Dシーングラフを構築するフレームワークArtiSGを提案する。提案手法では, 携帯型装置を用いた頑健な調音データ収集パイプラインを用いて, カメラのエゴモーション下においても6-DoF調音軌道と軸を正確に推定する。我々はこれらのキネマティックな先行要素を階層的でオープンな語彙グラフに統合し、相互作用データを利用して視覚的知覚によって欠落する目立たない機能的要素を発見する。広汎な実世界の実験により、ArtiSGは機能的要素リコールと調音推定精度において、ベースラインを著しく上回ることを示した。さらに,構築したグラフは,多種多様なオブジェクトを含む実環境において,ロボットが言語指向の操作を効果的に行うための信頼性の高い機能記憶として機能することを示す。

論文の概要: ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation

関連論文リスト