Efficient Regional Memory Network for Video Object Segmentation
- URL: http://arxiv.org/abs/2103.12934v1
- Date: Wed, 24 Mar 2021 02:08:46 GMT
- Title: Efficient Regional Memory Network for Video Object Segmentation
- Authors: Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, Wenxiu Sun
- Abstract summary: We propose a novel local-to-local matching solution for semi-supervised VOS, namely Regional Memory Network (RMNet)
The proposed RMNet effectively alleviates the ambiguity of similar objects in both memory and query frames.
Experimental results indicate that the proposed RMNet performs favorably against state-of-the-art methods on the DAVIS and YouTube-VOS datasets.
- Score: 56.587541750729045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, several Space-Time Memory based networks have shown that the object
cues (e.g. video frames as well as the segmented object masks) from the past
frames are useful for segmenting objects in the current frame. However, these
methods exploit the information from the memory by global-to-global matching
between the current and past frames, which lead to mismatching to similar
objects and high computational complexity. To address these problems, we
propose a novel local-to-local matching solution for semi-supervised VOS,
namely Regional Memory Network (RMNet). In RMNet, the precise regional memory
is constructed by memorizing local regions where the target objects appear in
the past frames. For the current query frame, the query regions are tracked and
predicted based on the optical flow estimated from the previous frame. The
proposed local-to-local matching effectively alleviates the ambiguity of
similar objects in both memory and query frames, which allows the information
to be passed from the regional memory to the query region efficiently and
effectively. Experimental results indicate that the proposed RMNet performs
favorably against state-of-the-art methods on the DAVIS and YouTube-VOS
datasets.
Related papers
- Local Compressed Video Stream Learning for Generic Event Boundary
Detection [25.37983456118522]
Event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
Existing methods typically require video frames to be decoded before feeding into the network.
We propose a novel event boundary detection method that is fully end-to-end leveraging rich information in the compressed domain.
arXiv Detail & Related papers (2023-09-27T06:49:40Z) - Region Aware Video Object Segmentation with Deep Motion Modeling [56.95836951559529]
Region Aware Video Object (RAVOS) is a method that predicts regions of interest for efficient object segmentation and memory storage.
For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation.
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.
arXiv Detail & Related papers (2022-07-21T01:44:40Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - Video Object Segmentation with Episodic Graph Memory Networks [198.74780033475724]
A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
arXiv Detail & Related papers (2020-07-14T13:19:19Z) - Dual Temporal Memory Network for Efficient Video Object Segmentation [42.05305410986511]
One of the fundamental challenges in Video Object (VOS) is how to make the most use of the temporal information to boost the performance.
We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories.
Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network.
arXiv Detail & Related papers (2020-03-13T06:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.