On Moving Object Segmentation from Monocular Video with Transformers
- URL: http://arxiv.org/abs/2411.19141v1
- Date: Thu, 28 Nov 2024 13:42:35 GMT
- Title: On Moving Object Segmentation from Monocular Video with Transformers
- Authors: Christian Homeyer, Christoph Schnörr,
- Abstract summary: We present a novel fusion architecture for monocular motion segmentation - M3Former.
We analyze different 2D and 3D motion representations for this problem and their importance for segmentation performance.
- Score: 3.683202928838613
- License:
- Abstract: Moving object detection and segmentation from a single moving camera is a challenging task, requiring an understanding of recognition, motion and 3D geometry. Combining both recognition and reconstruction boils down to a fusion problem, where appearance and motion features need to be combined for classification and segmentation. In this paper, we present a novel fusion architecture for monocular motion segmentation - M3Former, which leverages the strong performance of transformers for segmentation and multi-modal fusion. As reconstructing motion from monocular video is ill-posed, we systematically analyze different 2D and 3D motion representations for this problem and their importance for segmentation performance. Finally, we analyze the effect of training data and show that diverse datasets are required to achieve SotA performance on Kitti and Davis.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Motion Segmentation from a Moving Monocular Camera [3.115818438802931]
We take advantage of two popular branches of monocular motion segmentation approaches: point trajectory based and optical flow based methods.
We are able to model various complex object motions in different scene structures at once.
Our method shows state-of-the-art performance on the KT3DMoSeg dataset.
arXiv Detail & Related papers (2023-09-24T22:59:05Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan
Synchronization [61.015704878681795]
We present a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for 3D point clouds.
The two non-trivial challenges posed by this multi-scan multibody setting are.
guaranteeing correspondence and segmentation consistency across multiple input point clouds and.
obtaining robust motion-based rigid body segmentation applicable to novel object categories.
arXiv Detail & Related papers (2021-01-17T06:36:28Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation [93.22300146395536]
We design a computational architecture that discovers camouflaged objects in videos, specifically by exploiting motion information to perform object segmentation.
We collect the first large-scale Moving Camouflaged Animals (MoCA) video dataset, which consists of over 140 clips across a diverse range of animals.
We demonstrate the effectiveness of the proposed model on MoCA, and achieve competitive performance on the unsupervised segmentation protocol on DAVIS2016 by only relying on motion.
arXiv Detail & Related papers (2020-11-23T18:59:08Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.