Fugu-MT 論文翻訳(概要): Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

論文の概要: Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

arxiv url: http://arxiv.org/abs/2604.07997v1
Date: Thu, 09 Apr 2026 09:04:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.822416
Title: Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments
Title（参考訳）: 動的室内環境におけるわずかなショットインクリメンタル3次元物体検出
Authors: Yun Zhu, Jianjun Qian, Jian Yang, Jin Xie, Na Zhao,
Abstract要約: FI3Detは,数個の新しいサンプルを用いて,効率的な3次元認識のためのフレームワークである。視覚言語モデル(VLM)を使用して、目に見えないカテゴリの知識を学習する。 FI3Detはベースラインメソッドよりも強力で一貫した改善を実現している。
参考スコア（独自算出の注目度）: 43.43378522248249
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Incremental 3D object perception is a critical step toward embodied intelligence in dynamic indoor environments. However, existing incremental 3D detection methods rely on extensive annotations of novel classes for satisfactory performance. To address this limitation, we propose FI3Det, a Few-shot Incremental 3D Detection framework that enables efficient 3D perception with only a few novel samples by leveraging vision-language models (VLMs) to learn knowledge of unseen categories. FI3Det introduces a VLM-guided unknown object learning module in the base stage to enhance perception of unseen categories. Specifically, it employs VLMs to mine unknown objects and extract comprehensive representations, including 2D semantic features and class-agnostic 3D bounding boxes. To mitigate noise in these representations, a weighting mechanism is further designed to re-weight the contributions of point- and box-level features based on their spatial locations and feature consistency within each box. Moreover, FI3Det proposes a gated multimodal prototype imprinting module, where category prototypes are constructed from aligned 2D semantic and 3D geometric features to compute classification scores, which are then fused via a multimodal gating mechanism for novel object detection. As the first framework for few-shot incremental 3D object detection, we establish both batch and sequential evaluation settings on two datasets, ScanNet V2 and SUN RGB-D, where FI3Det achieves strong and consistent improvements over baseline methods. Code is available at https://github.com/zyrant/FI3Det.
Abstract（参考訳）: インクリメンタル3次元物体知覚は、動的屋内環境におけるインボディードインテリジェンスへの重要なステップである。しかし、既存のインクリメンタルな3D検出手法は、満足なパフォーマンスのために、新しいクラスの広範なアノテーションに依存している。この制限に対処するために,視覚言語モデル(VLM)を活用して,未知のカテゴリの知識を学習することにより,少数の新しいサンプルで効率的な3次元認識を可能にするFew-shot Incremental 3D DetectionフレームワークであるFI3Detを提案する。 FI3Detは、VLMで誘導された未知のオブジェクト学習モジュールをベースステージに導入し、目に見えないカテゴリの認識を高める。具体的には、VLMを使用して未知のオブジェクトをマイニングし、2Dセマンティック特徴やクラスに依存しない3Dバウンディングボックスを含む包括的な表現を抽出する。これらの表現におけるノイズを軽減するため、重み付け機構はさらに、各ボックス内の空間的位置と特徴の整合性に基づいて、点レベルの特徴と箱レベルの特徴の寄与を再重み付けするように設計されている。さらに、FI3Detは、カテゴリーのプロトタイプを整列した2次元意味と3次元幾何学的特徴から構築し、分類スコアを計算し、新しいオブジェクト検出のためのマルチモーダルゲーティング機構を介して融合するゲート型マルチモーダルプロトタイプインプリンティングモジュールを提案する。 2つのデータセット、ScanNet V2 と SUN RGB-D のバッチおよびシーケンシャルな評価設定を確立し、FI3Det はベースライン法よりも強力で一貫した改善を実現している。コードはhttps://github.com/zyrant/FI3Det.comで入手できる。

論文の概要: Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

関連論文リスト