Fugu-MT 論文翻訳(概要): Open-vocabulary 3D scene perception in industrial environments

論文の概要: Open-vocabulary 3D scene perception in industrial environments

arxiv url: http://arxiv.org/abs/2602.19823v1
Date: Mon, 23 Feb 2026 13:22:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-24 17:42:02.826954
Title: Open-vocabulary 3D scene perception in industrial environments
Title（参考訳）: 産業環境におけるオープンボキャブラリ3次元シーン認識
Authors: Keno Moenck, Adrian Philip Florea, Julian Koch, Thorsten Schüppstuhl,
Abstract要約: 2D Vision-Language Foundation Models (VLFMs) を利用した最近のオープン語彙法はこの課題をターゲットにしている。まず、そのようなモデルが一般化に失敗し、一般的な産業オブジェクトでは性能が良くないことを実証する。本稿では,この制限を克服する学習自由でオープンな3次元知覚パイプラインを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Autonomous vision applications in production, intralogistics, or manufacturing environments require perception capabilities beyond a small, fixed set of classes. Recent open-vocabulary methods, leveraging 2D Vision-Language Foundation Models (VLFMs), target this task but often rely on class-agnostic segmentation models pre-trained on non-industrial datasets (e.g., household scenes). In this work, we first demonstrate that such models fail to generalize, performing poorly on common industrial objects. Therefore, we propose a training-free, open-vocabulary 3D perception pipeline that overcomes this limitation. Instead of using a pre-trained model to generate instance proposals, our method simply generates masks by merging pre-computed superpoints based on their semantic features. Following, we evaluate the domain-adapted VLFM "IndustrialCLIP" on a representative 3D industrial workshop scene for open-vocabulary querying. Our qualitative results demonstrate successful segmentation of industrial objects.
Abstract（参考訳）: 生産、内科、製造環境における自律的な視覚応用は、小さな固定されたクラス以上の知覚能力を必要とする。 2D Vision-Language Foundation Models (VLFMs)を活用する最近のオープン語彙法は、このタスクをターゲットとしているが、しばしば非工業的なデータセット(例えば家庭シーン)で事前訓練されたクラスに依存しないセグメンテーションモデルに依存している。そこで本研究では,そのようなモデルが一般化に失敗し,一般的な産業オブジェクトに不利な結果をもたらすことを最初に実証する。そこで本研究では,この制限を克服する学習自由でオープンな3次元知覚パイプラインを提案する。提案手法では,事前学習モデルを用いてインスタンス提案を生成する代わりに,その意味的特徴に基づいて事前計算されたスーパーポイントをマージしてマスクを生成する。次に、ドメイン適応型VLFM"IndustrialCLIP"を、オープン語彙クエリのための3D産業ワークショップシーンで評価する。我々の定性的な結果は、産業オブジェクトのセグメンテーションが成功したことを示す。

論文の概要: Open-vocabulary 3D scene perception in industrial environments

関連論文リスト