Co-attention Propagation Network for Zero-Shot Video Object Segmentation
- URL: http://arxiv.org/abs/2304.03910v1
- Date: Sat, 8 Apr 2023 04:45:48 GMT
- Title: Co-attention Propagation Network for Zero-Shot Video Object Segmentation
- Authors: Gensheng Pei, Yazhou Yao, Fumin Shen, Dan Huang, Xingguo Huang, and
Heng-Tao Shen
- Abstract summary: Zero-shot object segmentation (ZS-VOS) aims to segment objects in a video sequence without prior knowledge of these objects.
Existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios.
We propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects.
- Score: 91.71692262860323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot video object segmentation (ZS-VOS) aims to segment foreground
objects in a video sequence without prior knowledge of these objects. However,
existing ZS-VOS methods often struggle to distinguish between foreground and
background or to keep track of the foreground in complex scenarios. The common
practice of introducing motion information, such as optical flow, can lead to
overreliance on optical flow estimation. To address these challenges, we
propose an encoder-decoder-based hierarchical co-attention propagation network
(HCPN) capable of tracking and segmenting objects. Specifically, our model is
built upon multiple collaborative evolutions of the parallel co-attention
module (PCM) and the cross co-attention module (CCM). PCM captures common
foreground regions among adjacent appearance and motion features, while CCM
further exploits and fuses cross-modal motion features returned by PCM. Our
method is progressively trained to achieve hierarchical spatio-temporal feature
propagation across the entire video. Experimental results demonstrate that our
HCPN outperforms all previous methods on public benchmarks, showcasing its
effectiveness for ZS-VOS.
Related papers
- MCA: Moment Channel Attention Networks [10.780493635885225]
We investigate the statistical moments of feature maps within a neural network.
Our findings highlight the critical role of high-order moments in enhancing model capacity.
We propose the Moment Channel Attention (MCA) framework, which efficiently incorporates multiple levels of moment-based information.
arXiv Detail & Related papers (2024-03-04T04:02:59Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Implicit Motion-Compensated Network for Unsupervised Video Object
Segmentation [25.41427065435164]
Unsupervised video object segmentation (UVOS) aims at automatically separating the primary foreground object(s) from the background in a video sequence.
Existing UVOS methods either lack robustness when there are visually similar surroundings (appearance-based) or suffer from deterioration in the quality of their predictions because of dynamic background and inaccurate flow (flow-based)
We propose an implicit motion-compensated network (IMCNet) combining complementary cues ($textiti.e.$, appearance and motion) with aligned motion information from the adjacent frames to the current frame at the feature level.
arXiv Detail & Related papers (2022-04-06T13:03:59Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - See More, Know More: Unsupervised Video Object Segmentation with
Co-Attention Siamese Networks [184.4379622593225]
We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task.
We emphasize the importance of inherent correlation among video frames and incorporate a global co-attention mechanism.
We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos.
arXiv Detail & Related papers (2020-01-19T11:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.