Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion
- URL: http://arxiv.org/abs/2405.01723v1
- Date: Thu, 2 May 2024 20:42:17 GMT
- Title: Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion
- Authors: Yuxiang Huang, Yuhao Chen, John Zelek,
- Abstract summary: We propose a novel monocular dense segmentation method that achieves state-of-the-art motion segmentation results in a zero-shot manner.
The proposed method synergestically combines the strengths of deep learning and geometric model fusion methods.
Experiments show that our method achieves competitive results on several motion segmentation datasets and even surpasses some state-of-the-art supervised methods on certain benchmarks.
- Score: 6.805017878728801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting and segmenting moving objects from a moving monocular camera is challenging in the presence of unknown camera motion, diverse object motions and complex scene structures. Most existing methods rely on a single motion cue to perform motion segmentation, which is usually insufficient when facing different complex environments. While a few recent deep learning based methods are able to combine multiple motion cues to achieve improved accuracy, they depend heavily on vast datasets and extensive annotations, making them less adaptable to new scenarios. To address these limitations, we propose a novel monocular dense segmentation method that achieves state-of-the-art motion segmentation results in a zero-shot manner. The proposed method synergestically combines the strengths of deep learning and geometric model fusion methods by performing geometric model fusion on object proposals. Experiments show that our method achieves competitive results on several motion segmentation datasets and even surpasses some state-of-the-art supervised methods on certain benchmarks, while not being trained on any data. We also present an ablation study to show the effectiveness of combining different geometric models together for motion segmentation, highlighting the value of our geometric model fusion strategy.
Related papers
- Instance-Level Moving Object Segmentation from a Single Image with Events [84.12761042512452]
Moving object segmentation plays a crucial role in understanding dynamic scenes involving multiple moving objects.
Previous methods encounter difficulties in distinguishing whether pixel displacements of an object are caused by camera motion or object motion.
Recent advances exploit the motion sensitivity of novel event cameras to counter conventional images' inadequate motion modeling capabilities.
We propose the first instance-level moving object segmentation framework that integrates complementary texture and motion cues.
arXiv Detail & Related papers (2025-02-18T15:56:46Z) - Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video [3.2195139886901813]
We present a novel framework that learns multi-level semantical dynamics and multi-frame human pose estimation.
Specifically, we first design a multi-masked context and pose reconstruction strategy.
This strategy stimulates the model to explore multi-temporal semantic relationships among frames by progressively masking the features of optical (patch) cubes and frames.
arXiv Detail & Related papers (2025-02-15T00:35:34Z) - On Moving Object Segmentation from Monocular Video with Transformers [3.683202928838613]
We present a novel fusion architecture for monocular motion segmentation - M3Former.
We analyze different 2D and 3D motion representations for this problem and their importance for segmentation performance.
arXiv Detail & Related papers (2024-11-28T13:42:35Z) - A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation [2.637467480598825]
We propose a unified model selection technique to automatically infer the number of motion groups for spectral clustering based motion segmentation methods.
We evaluate our method on the KT3DMoSeg dataset and achieve competitve results comparing to the baseline.
arXiv Detail & Related papers (2024-03-03T20:16:14Z) - Motion Segmentation from a Moving Monocular Camera [3.115818438802931]
We take advantage of two popular branches of monocular motion segmentation approaches: point trajectory based and optical flow based methods.
We are able to model various complex object motions in different scene structures at once.
Our method shows state-of-the-art performance on the KT3DMoSeg dataset.
arXiv Detail & Related papers (2023-09-24T22:59:05Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.