Fugu-MT 論文翻訳(概要): Instance-Level Moving Object Segmentation from a Single Image with Events

論文の概要: Instance-Level Moving Object Segmentation from a Single Image with Events

arxiv url: http://arxiv.org/abs/2502.12975v1
Date: Tue, 18 Feb 2025 15:56:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-19 20:12:09.044213
Title: Instance-Level Moving Object Segmentation from a Single Image with Events
Title（参考訳）: イベント付き単一画像からのインスタンスレベル移動オブジェクトセグメンテーション
Authors: Zhexiong Wan, Bin Fan, Le Hui, Yuchao Dai, Gim Hee Lee,
Abstract要約: 移動対象セグメンテーションは、複数の移動対象を含む動的なシーンを理解する上で重要な役割を果たす。従来の手法では、物体の画素変位がカメラの動きや物体の動きによって引き起こされるかどうかを区別することが困難であった。近年の進歩は、従来の画像の不適切な動作モデリング機能に対抗するために、新しいイベントカメラの動作感度を利用する。補完的なテクスチャとモーションキューを統合した,最初のインスタンスレベルの移動オブジェクトセグメンテーションフレームワークを提案する。
参考スコア（独自算出の注目度）: 84.12761042512452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Moving object segmentation plays a crucial role in understanding dynamic scenes involving multiple moving objects, while the difficulties lie in taking into account both spatial texture structures and temporal motion cues. Existing methods based on video frames encounter difficulties in distinguishing whether pixel displacements of an object are caused by camera motion or object motion due to the complexities of accurate image-based motion modeling. Recent advances exploit the motion sensitivity of novel event cameras to counter conventional images' inadequate motion modeling capabilities, but instead lead to challenges in segmenting pixel-level object masks due to the lack of dense texture structures in events. To address these two limitations imposed by unimodal settings, we propose the first instance-level moving object segmentation framework that integrates complementary texture and motion cues. Our model incorporates implicit cross-modal masked attention augmentation, explicit contrastive feature learning, and flow-guided motion enhancement to exploit dense texture information from a single image and rich motion information from events, respectively. By leveraging the augmented texture and motion features, we separate mask segmentation from motion classification to handle varying numbers of independently moving objects. Through extensive evaluations on multiple datasets, as well as ablation experiments with different input settings and real-time efficiency analysis of the proposed framework, we believe that our first attempt to incorporate image and event data for practical deployment can provide new insights for future work in event-based motion related works. The source code with model training and pre-trained weights is released at https://npucvr.github.io/EvInsMOS
Abstract（参考訳）: 移動物体のセグメンテーションは、複数の移動物体を含む動的なシーンを理解する上で重要な役割を担っている。映像フレームに基づく既存の手法では, カメラの動きや物体の動きによって物体の画素変位が生じるかどうかを, 正確な画像ベース・モーション・モデリングの複雑さにより判別することが困難である。近年の進歩は、従来の画像の不適切な動作モデリング機能に対抗するために、新しいイベントカメラの動作感度を利用するが、イベントに密集したテクスチャ構造が欠如していることから、ピクセルレベルのオブジェクトマスクのセグメンテーションの課題に繋がる。本研究では, この2つの制約に対処するために, 相補的なテクスチャと動作手段を統合した最初のインスタンスレベルの移動オブジェクトセグメンテーションフレームワークを提案する。本モデルでは,1つの画像から密集したテクスチャ情報とイベントからのリッチなモーション情報を利用するために,暗黙のクロスモーダルマスクによる注意増強,明示的なコントラスト的特徴学習,フロー誘導運動強調を取り入れた。拡張テクスチャと運動特徴を活用して、運動分類からマスクセグメンテーションを分離し、独立に動く物体の様々な数を扱う。複数のデータセットに対する広範な評価や、異なる入力設定によるアブレーション実験、提案フレームワークのリアルタイム効率分析などを通じて、我々は、画像データとイベントデータを実用的なデプロイメントに組み込む最初の試みは、イベントベースのモーション関連作業における今後の作業に新たな洞察を与えることができると信じている。モデルトレーニングと事前トレーニングされた重み付きソースコードはhttps://npucvr.github.io/EvInsMOSで公開されている。

論文の概要: Instance-Level Moving Object Segmentation from a Single Image with Events

関連論文リスト