Dual Temporal Memory Network for Efficient Video Object Segmentation
- URL: http://arxiv.org/abs/2003.06125v1
- Date: Fri, 13 Mar 2020 06:07:45 GMT
- Title: Dual Temporal Memory Network for Efficient Video Object Segmentation
- Authors: Kaihua Zhang, Long Wang, Dong Liu, Bo Liu, Qingshan Liu and Zhu Li
- Abstract summary: One of the fundamental challenges in Video Object (VOS) is how to make the most use of the temporal information to boost the performance.
We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories.
Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network.
- Score: 42.05305410986511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Object Segmentation (VOS) is typically formulated in a semi-supervised
setting. Given the ground-truth segmentation mask on the first frame, the task
of VOS is to track and segment the single or multiple objects of interests in
the rest frames of the video at the pixel level. One of the fundamental
challenges in VOS is how to make the most use of the temporal information to
boost the performance. We present an end-to-end network which stores short- and
long-term video sequence information preceding the current frame as the
temporal memories to address the temporal modeling in VOS. Our network consists
of two temporal sub-networks including a short-term memory sub-network and a
long-term memory sub-network. The short-term memory sub-network models the
fine-grained spatial-temporal interactions between local regions across
neighboring frames in video via a graph-based learning framework, which can
well preserve the visual consistency of local regions over time. The long-term
memory sub-network models the long-range evolution of object via a
Simplified-Gated Recurrent Unit (S-GRU), making the segmentation be robust
against occlusions and drift errors. In our experiments, we show that our
proposed method achieves a favorable and competitive performance on three
frequently-used VOS datasets, including DAVIS 2016, DAVIS 2017 and Youtube-VOS
in terms of both speed and accuracy.
Related papers
- Temporally Consistent Referring Video Object Segmentation with Hybrid Memory [98.80249255577304]
We propose an end-to-end R-VOS paradigm that explicitly models temporal consistency alongside the referring segmentation.
Features of frames with automatically generated high-quality reference masks are propagated to segment remaining frames.
Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin.
arXiv Detail & Related papers (2024-03-28T13:32:49Z) - Efficient Long-Short Temporal Attention Network for Unsupervised Video
Object Segmentation [23.645412918420906]
Unsupervised Video Object (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.
Previous methods do not fully use spatial-temporal context and fail to tackle this challenging task in real-time.
This motivates us to develop an efficient Long-Short Temporal Attention network (termed LSTA) for unsupervised VOS task from a holistic view.
arXiv Detail & Related papers (2023-09-21T01:09:46Z) - Region Aware Video Object Segmentation with Deep Motion Modeling [56.95836951559529]
Region Aware Video Object (RAVOS) is a method that predicts regions of interest for efficient object segmentation and memory storage.
For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation.
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.
arXiv Detail & Related papers (2022-07-21T01:44:40Z) - Local Memory Attention for Fast Video Semantic Segmentation [157.7618884769969]
We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline.
Our approach aggregates a rich representation of the semantic information in past frames into a memory module.
We observe an improvement in segmentation performance on Cityscapes by 1.7% and 2.1% in mIoU respectively, while increasing inference time of ERFNet by only 1.5ms.
arXiv Detail & Related papers (2021-01-05T18:57:09Z) - Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised
Video Object Segmentation [27.559093073097483]
Current approaches for Semi-supervised Video Object (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame.
We exploit this observation by using temporal information to quickly identify frames with minimal change.
We propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose.
arXiv Detail & Related papers (2020-12-21T19:40:17Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - Video Object Segmentation with Episodic Graph Memory Networks [198.74780033475724]
A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
arXiv Detail & Related papers (2020-07-14T13:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.