Hierarchical Memory Matching Network for Video Object Segmentation
- URL: http://arxiv.org/abs/2109.11404v1
- Date: Thu, 23 Sep 2021 14:36:43 GMT
- Title: Hierarchical Memory Matching Network for Video Object Segmentation
- Authors: Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon
Lee, Euntai Kim
- Abstract summary: We propose two advanced memory read modules that enable us to perform memory in multiple scales while exploiting temporal smoothness.
We first propose a guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods.
We introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale.
- Score: 38.24999776705497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Hierarchical Memory Matching Network (HMMN) for semi-supervised
video object segmentation. Based on a recent memory-based method [33], we
propose two advanced memory read modules that enable us to perform memory
reading in multiple scales while exploiting temporal smoothness. We first
propose a kernel guided memory matching module that replaces the non-local
dense memory read, commonly adopted in previous memory-based methods. The
module imposes the temporal smoothness constraint in the memory read, leading
to accurate memory retrieval. More importantly, we introduce a hierarchical
memory matching scheme and propose a top-k guided memory matching module in
which memory read on a fine-scale is guided by that on a coarse-scale. With the
module, we perform memory read in multiple scales efficiently and leverage both
high-level semantic and low-level fine-grained memory features to predict
detailed object masks. Our network achieves state-of-the-art performance on the
validation sets of DAVIS 2016/2017 (90.8% and 84.7%) and YouTube-VOS 2018/2019
(82.6% and 82.5%), and test-dev set of DAVIS 2017 (78.6%). The source code and
model are available online: https://github.com/Hongje/HMMN.
Related papers
- LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - Robust and Efficient Memory Network for Video Object Segmentation [6.7995672846437305]
This paper proposes a Robust and Efficient Memory Network, or REMN, for studying semi-supervised video object segmentation (VOS)
We introduce a local attention mechanism that tackles the background distraction by enhancing the features of foreground objects with the previous mask.
Experiments demonstrate that our REMN achieves state-of-the-art results on DAVIS 2017, with a $mathcalJ&F$ score of 86.3% and on YouTube-VOS 2018, with a $mathcalG$ over mean of 85.5%.
arXiv Detail & Related papers (2023-04-24T06:19:21Z) - Learning Quality-aware Dynamic Memory for Video Object Segmentation [32.06309833058726]
We propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame.
Our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks.
arXiv Detail & Related papers (2022-07-16T12:18:04Z) - XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
Memory Model [137.50614198301733]
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores.
We develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores.
XMem greatly exceeds state-of-the-art performance on long-video datasets.
arXiv Detail & Related papers (2022-07-14T17:59:37Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Kanerva++: extending The Kanerva Machine with differentiable, locally
block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model.
We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory.
We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z) - Video Object Segmentation with Episodic Graph Memory Networks [198.74780033475724]
A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
arXiv Detail & Related papers (2020-07-14T13:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.