XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
Memory Model
- URL: http://arxiv.org/abs/2207.07115v2
- Date: Mon, 18 Jul 2022 17:56:53 GMT
- Title: XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
Memory Model
- Authors: Ho Kei Cheng and Alexander G. Schwing
- Abstract summary: We present XMem, a video object segmentation architecture for long videos with unified feature memory stores.
We develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores.
XMem greatly exceeds state-of-the-art performance on long-video datasets.
- Score: 137.50614198301733
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present XMem, a video object segmentation architecture for long videos
with unified feature memory stores inspired by the Atkinson-Shiffrin memory
model. Prior work on video object segmentation typically only uses one type of
feature memory. For videos longer than a minute, a single feature memory model
tightly links memory consumption and accuracy. In contrast, following the
Atkinson-Shiffrin model, we develop an architecture that incorporates multiple
independent yet deeply-connected feature memory stores: a rapidly updated
sensory memory, a high-resolution working memory, and a compact thus sustained
long-term memory. Crucially, we develop a memory potentiation algorithm that
routinely consolidates actively used working memory elements into the long-term
memory, which avoids memory explosion and minimizes performance decay for
long-term prediction. Combined with a new memory reading mechanism, XMem
greatly exceeds state-of-the-art performance on long-video datasets while being
on par with state-of-the-art methods (that do not work on long videos) on
short-video datasets. Code is available at https://hkchengrex.github.io/XMem
Related papers
- Efficient Video Object Segmentation via Modulated Cross-Attention Memory [123.12273176475863]
We propose a transformer-based approach, named MAVOS, to model temporal smoothness without requiring frequent memory expansion.
Our MAVOS achieves a J&F score of 63.3% while operating at 37 frames per second (FPS) on a single V100 GPU.
arXiv Detail & Related papers (2024-03-26T17:59:58Z) - Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit.
We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z) - READMem: Robust Embedding Association for a Diverse Memory in
Unconstrained Video Object Segmentation [24.813416082160224]
We present READMem, a modular framework for sVOS methods to handle unconstrained videos.
We propose a robust association of the embeddings stored in the memory with query embeddings during the update process.
Our approach achieves competitive results on the Long-time Video dataset (LV1) while not hindering performance on short sequences.
arXiv Detail & Related papers (2023-05-22T08:31:16Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Hierarchical Memory Matching Network for Video Object Segmentation [38.24999776705497]
We propose two advanced memory read modules that enable us to perform memory in multiple scales while exploiting temporal smoothness.
We first propose a guided memory matching module that replaces the non-local dense memory read, commonly adopted in previous memory-based methods.
We introduce a hierarchical memory matching scheme and propose a top-k guided memory matching module in which memory read on a fine-scale is guided by that on a coarse-scale.
arXiv Detail & Related papers (2021-09-23T14:36:43Z) - Space Time Recurrent Memory Network [35.06536468525509]
We propose a novel visual memory network architecture for the learning and inference problem in the spatial-temporal domain.
This architecture is benchmarked on the video object segmentation and video prediction problems.
We show that our memory architecture can achieve competitive results with state-of-the-art while maintaining constant memory capacity.
arXiv Detail & Related papers (2021-09-14T06:53:51Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.