Related papers: Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

URL: http://arxiv.org/abs/2403.01968v1
Date: Mon, 4 Mar 2024 12:11:07 GMT
Title: Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection
Authors: Xin Zhang, Tao Xiao, Gepeng Ji, Xuan Wu, Keren Fu, Qijun Zhao
Abstract summary: Existing video camouflaged object detection approaches take noisy motion estimation as input or model motion implicitly. We propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation.
Score: 23.059829327898818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation. Interactions across the dual streams are realized in an interactive prompting way that is inspired by emerging visual prompt learning. Two learnable modules, i.e. the camouflaged feeder and motion collector, are designed to incorporate segmentation-to-motion and motion-to-segmentation prompts, respectively, and enhance outputs of the both streams. The prompt fed to the motion stream is learned by supervising optical flow in a self-supervised manner. Furthermore, we show that long-term historical information can also be incorporated as a prompt into EMIP and achieve more robust results with temporal consistency. Experimental results demonstrate that our EMIP achieves new state-of-the-art records on popular VCOD benchmarks. The code will be publicly available.

Related papers

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation [81.4106601222722]
Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. We propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Our method includes an object perception module and a Chain-of-Thought-based motion reasoning module.
arXiv Detail & Related papers (2025-02-27T08:21:03Z)
Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI) We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z)
Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions. Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
Out of the Room: Generalizing Event-Based Dynamic Motion Segmentation for Complex Scenes [10.936350433952668]
Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments. This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too.
arXiv Detail & Related papers (2024-03-07T14:59:34Z)
ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based Motion Segmentation [101.19290845597918]
This paper presents a Motion Estimation (ME) module and an Event Denoising (ED) module jointly optimized in a mutually reinforced manner. Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression.
arXiv Detail & Related papers (2022-03-22T13:40:26Z)
Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework. It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z)
Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame. Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency. We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector. We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z)
Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder. In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.