A Deep Moving-camera Background Model
- URL: http://arxiv.org/abs/2209.07923v1
- Date: Fri, 16 Sep 2022 13:36:54 GMT
- Title: A Deep Moving-camera Background Model
- Authors: Guy Erez, Ron Shapira Weber, Oren Freifeld
- Abstract summary: We propose a new method for learning Moving-camera Background Models (MCBM) in video analysis videos.
DeepMCBM eliminates the problems associated with joint alignment and achieves state-of-the-art results.
We demonstrate DeepMCBM's utility on a variety of videos, including ones beyond the scope of other methods.
- Score: 5.564705758320338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In video analysis, background models have many applications such as
background/foreground separation, change detection, anomaly detection,
tracking, and more. However, while learning such a model in a video captured by
a static camera is a fairly-solved task, in the case of a Moving-camera
Background Model (MCBM), the success has been far more modest due to
algorithmic and scalability challenges that arise due to the camera motion.
Thus, existing MCBMs are limited in their scope and their supported
camera-motion types. These hurdles also impeded the employment, in this
unsupervised task, of end-to-end solutions based on deep learning (DL).
Moreover, existing MCBMs usually model the background either on the domain of a
typically-large panoramic image or in an online fashion. Unfortunately, the
former creates several problems, including poor scalability, while the latter
prevents the recognition and leveraging of cases where the camera revisits
previously-seen parts of the scene. This paper proposes a new method, called
DeepMCBM, that eliminates all the aforementioned issues and achieves
state-of-the-art results. Concretely, first we identify the difficulties
associated with joint alignment of video frames in general and in a DL setting
in particular. Next, we propose a new strategy for joint alignment that lets us
use a spatial transformer net with neither a regularization nor any form of
specialized (and non-differentiable) initialization. Coupled with an
autoencoder conditioned on unwarped robust central moments (obtained from the
joint alignment), this yields an end-to-end regularization-free MCBM that
supports a broad range of camera motions and scales gracefully. We demonstrate
DeepMCBM's utility on a variety of videos, including ones beyond the scope of
other methods. Our code is available at https://github.com/BGU-CS-VIL/DeepMCBM .
Related papers
- CPA: Camera-pose-awareness Diffusion Transformer for Video Generation [15.512186399114999]
CPA is a text-to-video generation approach that integrates the textual, visual, and spatial conditions.
Our method outperforms LDM-based methods for long video generation while achieving optimal performance in trajectory consistency and object consistency.
arXiv Detail & Related papers (2024-12-02T12:10:00Z) - RoMo: Robust Motion Segmentation Improves Structure from Motion [46.77236343300953]
We propose a novel approach to video-based motion segmentation to identify the components of a scene that are moving w.r.t. a fixed world frame.
Our simple but effective iterative method, RoMo, combines optical flow and epipolar cues with a pre-trained video segmentation model.
More importantly, the combination of an off-the-shelf SfM pipeline with our segmentation masks establishes a new state-of-the-art on camera calibration for scenes with dynamic content, outperforming existing methods by a substantial margin.
arXiv Detail & Related papers (2024-11-27T01:09:56Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation [9.333052173412158]
We introduce UCMCTrack, a novel motion model-based tracker robust to camera movements.
Unlike conventional CMC that computes compensation parameters frame-by-frame, UCMCTrack consistently applies the same compensation parameters throughout a video sequence.
It achieves state-of-the-art performance across a variety of challenging datasets, including MOT17, MOT20, DanceTrack and KITTI.
arXiv Detail & Related papers (2023-12-14T14:01:35Z) - Co-attention Propagation Network for Zero-Shot Video Object Segmentation [91.71692262860323]
Zero-shot object segmentation (ZS-VOS) aims to segment objects in a video sequence without prior knowledge of these objects.
Existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios.
We propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects.
arXiv Detail & Related papers (2023-04-08T04:45:48Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate
Multi-Camera Multiple Object Tracking [25.98400206361454]
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications.
This work proposes a new Dynamic Graph Model with Link Prediction approach to solve the data association task.
Experimental results show that we outperform existing MC-MOT algorithms by a large margin on several practical datasets.
arXiv Detail & Related papers (2021-06-12T20:22:30Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.