Implicit Motion Handling for Video Camouflaged Object Detection
- URL: http://arxiv.org/abs/2203.07363v2
- Date: Tue, 15 Mar 2022 13:44:01 GMT
- Title: Implicit Motion Handling for Video Camouflaged Object Detection
- Authors: Xuelian Cheng, Huan Xiong, Deng-Ping Fan, Yiran Zhong, Mehrtash
Harandi, Tom Drummond, Zongyuan Ge
- Abstract summary: We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
- Score: 60.98467179649398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new video camouflaged object detection (VCOD) framework that can
exploit both short-term dynamics and long-term temporal consistency to detect
camouflaged objects from video frames. An essential property of camouflaged
objects is that they usually exhibit patterns similar to the background and
thus make them hard to identify from still images. Therefore, effectively
handling temporal dynamics in videos becomes the key for the VCOD task as the
camouflaged objects will be noticeable when they move. However, current VCOD
methods often leverage homography or optical flows to represent motions, where
the detection error may accumulate from both the motion estimation error and
the segmentation error. On the other hand, our method unifies motion estimation
and object segmentation within a single optimization framework. Specifically,
we build a dense correlation volume to implicitly capture motions between
neighbouring frames and utilize the final segmentation supervision to optimize
the implicit motion estimation and segmentation jointly. Furthermore, to
enforce temporal consistency within a video sequence, we jointly utilize a
spatio-temporal transformer to refine the short-term predictions. Extensive
experiments on VCOD benchmarks demonstrate the architectural effectiveness of
our approach. We also provide a large-scale VCOD dataset named MoCA-Mask with
pixel-level handcrafted ground-truth masks and construct a comprehensive VCOD
benchmark with previous methods to facilitate research in this direction.
Dataset Link: https://xueliancheng.github.io/SLT-Net-project.
Related papers
- Explicit Motion Handling and Interactive Prompting for Video Camouflaged
Object Detection [23.059829327898818]
Existing video camouflaged object detection approaches take noisy motion estimation as input or model motion implicitly.
We propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly.
EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation.
arXiv Detail & Related papers (2024-03-04T12:11:07Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Efficient Long-Short Temporal Attention Network for Unsupervised Video
Object Segmentation [23.645412918420906]
Unsupervised Video Object (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.
Previous methods do not fully use spatial-temporal context and fail to tackle this challenging task in real-time.
This motivates us to develop an efficient Long-Short Temporal Attention network (termed LSTA) for unsupervised VOS task from a holistic view.
arXiv Detail & Related papers (2023-09-21T01:09:46Z) - Motion-inductive Self-supervised Object Discovery in Videos [99.35664705038728]
We propose a model for processing consecutive RGB frames, and infer the optical flow between any pair of frames using a layered representation.
We demonstrate superior performance over previous state-of-the-art methods on three public video segmentation datasets.
arXiv Detail & Related papers (2022-10-01T08:38:28Z) - Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD.
In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames.
In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches.
The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z) - Joint Detection and Tracking in Videos with Identification Features [36.55599286568541]
We propose the first joint optimization of detection, tracking and re-identification features for videos.
Our method reaches the state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge among online trackers, and 3rd overall.
arXiv Detail & Related papers (2020-05-21T21:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.