Explicit Motion Handling and Interactive Prompting for Video Camouflaged
Object Detection
- URL: http://arxiv.org/abs/2403.01968v1
- Date: Mon, 4 Mar 2024 12:11:07 GMT
- Title: Explicit Motion Handling and Interactive Prompting for Video Camouflaged
Object Detection
- Authors: Xin Zhang, Tao Xiao, Gepeng Ji, Xuan Wu, Keren Fu, Qijun Zhao
- Abstract summary: Existing video camouflaged object detection approaches take noisy motion estimation as input or model motion implicitly.
We propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly.
EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation.
- Score: 23.059829327898818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camouflage poses challenges in distinguishing a static target, whereas any
movement of the target can break this disguise. Existing video camouflaged
object detection (VCOD) approaches take noisy motion estimation as input or
model motion implicitly, restricting detection performance in complex dynamic
scenes. In this paper, we propose a novel Explicit Motion handling and
Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion
cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP
is characterized by a two-stream architecture for simultaneously conducting
camouflaged segmentation and optical flow estimation. Interactions across the
dual streams are realized in an interactive prompting way that is inspired by
emerging visual prompt learning. Two learnable modules, i.e. the camouflaged
feeder and motion collector, are designed to incorporate segmentation-to-motion
and motion-to-segmentation prompts, respectively, and enhance outputs of the
both streams. The prompt fed to the motion stream is learned by supervising
optical flow in a self-supervised manner. Furthermore, we show that long-term
historical information can also be incorporated as a prompt into EMIP and
achieve more robust results with temporal consistency. Experimental results
demonstrate that our EMIP achieves new state-of-the-art records on popular VCOD
benchmarks. The code will be publicly available.
Related papers
- Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)
General Implicit Motion Modeling (IMM) is a novel and effective approach to motion modeling VFI.
Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation
for Complex Scenes [10.936350433952668]
Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors.
Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments.
This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too.
arXiv Detail & Related papers (2024-03-07T14:59:34Z) - ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based
Motion Segmentation [101.19290845597918]
This paper presents a Motion Estimation (ME) module and an Event Denoising (ED) module jointly optimized in a mutually reinforced manner.
Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression.
arXiv Detail & Related papers (2022-03-22T13:40:26Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.