Fugu-MT 論文翻訳(概要): T-FunS3D: Task-Driven Hierarchical Open-Vocabulary 3D Functionality Segmentation

論文の概要: T-FunS3D: Task-Driven Hierarchical Open-Vocabulary 3D Functionality Segmentation

arxiv url: http://arxiv.org/abs/2606.05975v1
Date: Thu, 04 Jun 2026 10:16:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.719099
Title: T-FunS3D: Task-Driven Hierarchical Open-Vocabulary 3D Functionality Segmentation
Title（参考訳）: T-FunS3D:タスク駆動型階層型オープンボキャブラリ3次元機能分割
Authors: Jingkun Feng, Reza Sabzevari,
Abstract要約: タスク駆動型階層型オープンボキャブラリ3D機能セグメンテーション手法であるT-FunS3Dを紹介する。我々は,環境中のインスタンスとその視覚的埋め込みを抽出し,オープンな語彙シーングラフを構築した。タスク記述が与えられた場合、T-FunS3Dはシーングラフで最も関連性の高いインスタンスを特定し、それらの機能コンポーネントを特定する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-vocabulary 3D functionality segmentation enables robots to localize functional object components in 3D scenes. It is a challenging task that requires spatial understanding and task interpretation. Current open-vocabulary 3D segmentation methods primarily focus on object-level recognition, while scene-wide part segmentation methods attempt to segment the entire scene exhaustively, making them highly resource-intensive and time consuming. Balancing segmentation performance in terms of granularity, accuracy, and speed remains a challenge. As one step towards alleviating this, we introduce T-FunS3D, a task-driven hierarchical open-vocabulary 3D functionality segmentation method that provides actionable perception for robotic applications. Our method takes as input the 3D point cloud and posed RGB-D images of an indoor scene. We construct an open-vocabulary scene graph by extracting instances and their visual embeddings in the environment. Given a task description, T-FunS3D identifies the most relevant instances in the scene graph and locates their functional components leveraging a vision-language model. Experiments on the SceneFun3D dataset demonstrate that T-FunS3D is comparable to state-of-the-art in open-vocabulary 3D functionality segmentation, while achieving faster runtime and reduced memory usage.
Abstract（参考訳）: オープンな3D機能セグメンテーションにより、ロボットは3Dシーンで機能的なオブジェクトコンポーネントをローカライズすることができる。空間的理解とタスク解釈を必要とする課題である。現在のオープンな3Dセグメンテーション法は主にオブジェクトレベルの認識に重点を置いているが、シーン全体のセグメンテーション法はシーン全体を抜本的にセグメンテーションしようと試みており、リソース集約的で時間を要する。粒度、精度、速度の面でのセグメンテーション性能のバランスをとることは依然として課題である。そこで本研究では,タスク駆動型階層型オープンボキャブラリ3D機能セグメンテーション手法であるT-FunS3Dを導入する。本手法は3次元点雲を入力として室内シーンのRGB-D画像を作成する。我々は,環境中のインスタンスとその視覚的埋め込みを抽出し,オープンな語彙シーングラフを構築した。タスク記述が与えられた場合、T-FunS3Dはシーングラフの最も関連性の高いインスタンスを特定し、視覚言語モデルを利用した機能コンポーネントを特定する。 SceneFun3Dデータセットの実験では、T-FunS3Dは、より高速なランタイムとメモリ使用量の削減を実現しつつ、オープンな3D機能のセグメンテーションにおける最先端に匹敵することを示した。

論文の概要: T-FunS3D: Task-Driven Hierarchical Open-Vocabulary 3D Functionality Segmentation

関連論文リスト