Video Object Segmentation with Episodic Graph Memory Networks
- URL: http://arxiv.org/abs/2007.07020v4
- Date: Wed, 9 Dec 2020 09:58:23 GMT
- Title: Video Object Segmentation with Episodic Graph Memory Networks
- Authors: Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing
Shen and Luc Van Gool
- Abstract summary: A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
- Score: 198.74780033475724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to make a segmentation model efficiently adapt to a specific video and to
online target appearance variations are fundamentally crucial issues in the
field of video object segmentation. In this work, a graph memory network is
developed to address the novel idea of "learning to update the segmentation
model". Specifically, we exploit an episodic memory network, organized as a
fully connected graph, to store frames as nodes and capture cross-frame
correlations by edges. Further, learnable controllers are embedded to ease
memory reading and writing, as well as maintain a fixed memory scale. The
structured, external memory design enables our model to comprehensively mine
and quickly store new knowledge, even with limited visual information, and the
differentiable memory controllers slowly learn an abstract method for storing
useful representations in the memory and how to later use these representations
for prediction, via gradient descent. In addition, the proposed graph memory
network yields a neat yet principled framework, which can generalize well both
one-shot and zero-shot video object segmentation tasks. Extensive experiments
on four challenging benchmark datasets verify that our graph memory network is
able to facilitate the adaptation of the segmentation network for case-by-case
video object segmentation.
Related papers
- Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Joint Modeling of Feature, Correspondence, and a Compressed Memory for
Video Object Segmentation [52.11279360934703]
Current prevailing Video Object (VOS) methods usually perform dense matching between the current and reference frames after extracting features.
We propose a unified VOS framework, coined as JointFormer, for joint modeling of the three elements of feature, correspondence, and a compressed memory.
arXiv Detail & Related papers (2023-08-25T17:30:08Z) - Region Aware Video Object Segmentation with Deep Motion Modeling [56.95836951559529]
Region Aware Video Object (RAVOS) is a method that predicts regions of interest for efficient object segmentation and memory storage.
For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation.
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.
arXiv Detail & Related papers (2022-07-21T01:44:40Z) - Learning Quality-aware Dynamic Memory for Video Object Segmentation [32.06309833058726]
We propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame.
Our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks.
arXiv Detail & Related papers (2022-07-16T12:18:04Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Adaptive Memory Management for Video Object Segmentation [6.282068591820945]
A matching-based network stores every-k frames in an external memory bank for future inference.
The size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos.
This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features.
arXiv Detail & Related papers (2022-04-13T19:59:07Z) - Efficient Multi-Organ Segmentation Using SpatialConfiguration-Net with
Low GPU Memory Requirements [8.967700713755281]
In this work, we employ a multi-organ segmentation model based on the SpatialConfiguration-Net (SCN)
We modified the architecture of the segmentation model to reduce its memory footprint without drastically impacting the quality of the predictions.
Lastly, we implemented a minimal inference script for which we optimized both, execution time and required GPU memory.
arXiv Detail & Related papers (2021-11-26T17:47:10Z) - Memory-based Semantic Segmentation for Off-road Unstructured Natural
Environments [29.498304237783763]
We propose a built-in memory module for semantic segmentation.
The memory module stores significant representations of training images as memory items.
We conduct experiments on the Robot Unstructured Ground Driving dataset and RELLIS dataset.
arXiv Detail & Related papers (2021-08-12T10:04:47Z) - Local Memory Attention for Fast Video Semantic Segmentation [157.7618884769969]
We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline.
Our approach aggregates a rich representation of the semantic information in past frames into a memory module.
We observe an improvement in segmentation performance on Cityscapes by 1.7% and 2.1% in mIoU respectively, while increasing inference time of ERFNet by only 1.5ms.
arXiv Detail & Related papers (2021-01-05T18:57:09Z) - Dual Temporal Memory Network for Efficient Video Object Segmentation [42.05305410986511]
One of the fundamental challenges in Video Object (VOS) is how to make the most use of the temporal information to boost the performance.
We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories.
Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network.
arXiv Detail & Related papers (2020-03-13T06:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.