Saliency detection with moving camera via background model completion
- URL: http://arxiv.org/abs/2111.01681v1
- Date: Sat, 30 Oct 2021 11:17:58 GMT
- Title: Saliency detection with moving camera via background model completion
- Authors: Yupei Zhang, Kwok-Leung Chan
- Abstract summary: We propose a new framework called saliency detection via background model completion (SDBMC)
It comprises a background modeler and deep learning background/foreground segmentation network.
We adopt the background/foreground segmenter, although pre-trained with a specific video dataset, can also detect saliency in unseen videos.
- Score: 0.5076419064097734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To detect saliency in video is a fundamental step in many computer vision
systems. Saliency is the significant target(s) in the video. The object of
interest is further analyzed for high-level applications. The segregation of
saliency and the background can be made if they exhibit different visual cues.
Therefore, saliency detection is often formulated as background subtraction.
However, saliency detection is challenging. For instance, dynamic background
can result in false positive errors. In another scenario, camouflage will lead
to false negative errors. With moving camera, the captured scenes are even more
complicated to handle. We propose a new framework, called saliency detection
via background model completion (SD-BMC), that comprises of a background
modeler and the deep learning background/foreground segmentation network. The
background modeler generates an initial clean background image from a short
image sequence. Based on the idea of video completion, a good background frame
can be synthesized with the co-existence of changing background and moving
objects. We adopt the background/foreground segmenter, although pre-trained
with a specific video dataset, can also detect saliency in unseen videos. The
background modeler can adjust the background image dynamically when the
background/foreground segmenter output deteriorates during processing of a long
video. To the best of our knowledge, our framework is the first one to adopt
video completion for background modeling and saliency detection in videos
captured by moving camera. The results, obtained from the PTZ videos, show that
our proposed framework outperforms some deep learning-based background
subtraction models by 11% or more. With more challenging videos, our framework
also outperforms many high ranking background subtraction methods by more than
3%.
Related papers
- Lester: rotoscope animation through video object segmentation and
tracking [0.0]
Lester is a novel method to automatically synthetise retro-style 2D animations from videos.
Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT.
Results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances.
arXiv Detail & Related papers (2024-02-15T11:15:54Z) - ActAnywhere: Subject-Aware Video Background Generation [62.57759679425924]
Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community.
This task involves background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention.
We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts.
arXiv Detail & Related papers (2024-01-19T17:16:16Z) - Weakly Supervised Realtime Dynamic Background Subtraction [8.75682288556859]
We propose a weakly supervised framework that can perform background subtraction without requiring per-pixel ground-truth labels.
Our framework is trained on a moving object-free sequence of images and comprises two networks.
Our proposed method is online, real-time, efficient, and requires minimal frame-level annotation.
arXiv Detail & Related papers (2023-03-06T03:17:48Z) - Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm.
Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks.
We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z) - Autoencoder-based background reconstruction and foreground segmentation
with background noise estimation [1.3706331473063877]
We propose in this paper to model the background of a video sequence as a low dimensional manifold using an autoencoder.
The main novelty of the proposed model is that the autoencoder is also trained to predict the background noise, which allows to compute for each frame a pixel-dependent threshold.
Although the proposed model does not use any temporal or motion information, it exceeds the state of the art for unsupervised background subtraction on the CDnet 2014 and LASIESTA datasets.
arXiv Detail & Related papers (2021-12-15T09:51:00Z) - PreViTS: Contrastive Pretraining with Video Tracking Supervision [53.73237606312024]
PreViTS is an unsupervised SSL framework for selecting clips containing the same object.
PreViTS spatially constrains the frame regions to learn from and trains the model to locate meaningful objects.
We train a momentum contrastive (MoCo) encoder on VGG-Sound and Kinetics-400 datasets with PreViTS.
arXiv Detail & Related papers (2021-12-01T19:49:57Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - Motion-aware Self-supervised Video Representation Learning via
Foreground-background Merging [19.311818681787845]
We propose Foreground-background Merging (FAME) to compose the foreground region of the selected video onto the background of others.
We show that FAME can significantly boost the performance in different downstream tasks with various backbones.
arXiv Detail & Related papers (2021-09-30T13:45:26Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Removing the Background by Adding the Background: Towards Background
Robust Self-supervised Video Representation Learning [105.42550534895828]
Self-supervised learning has shown great potentials in improving the video representation ability of deep neural networks.
Some of the current methods tend to cheat from the background, i.e., the prediction is highly dependent on the video background instead of the motion.
We propose to remove the background impact by adding the background. That is, given a video, we randomly select a static frame and add it to every other frames to construct a distracting video sample.
Then we force the model to pull the feature of the distracting video and the feature of the original video closer, so that the model is explicitly restricted to
arXiv Detail & Related papers (2020-09-12T11:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.