Space Time Recurrent Memory Network
- URL: http://arxiv.org/abs/2109.06474v1
- Date: Tue, 14 Sep 2021 06:53:51 GMT
- Title: Space Time Recurrent Memory Network
- Authors: Hung Nguyen and Fuxin Li
- Abstract summary: We propose a novel visual memory network architecture for the learning and inference problem in the spatial-temporal domain.
This architecture is benchmarked on the video object segmentation and video prediction problems.
We show that our memory architecture can achieve competitive results with state-of-the-art while maintaining constant memory capacity.
- Score: 35.06536468525509
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel visual memory network architecture for the learning and
inference problem in the spatial-temporal domain. Different from the popular
transformers, we maintain a fixed set of memory slots in our memory network and
explore designs to input new information into the memory, combine the
information in different memory slots and decide when to discard old memory
slots. Finally, this architecture is benchmarked on the video object
segmentation and video prediction problems. Through the experiments, we show
that our memory architecture can achieve competitive results with
state-of-the-art while maintaining constant memory capacity.
Related papers
- Learning Quality-aware Dynamic Memory for Video Object Segmentation [32.06309833058726]
We propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame.
Our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks.
arXiv Detail & Related papers (2022-07-16T12:18:04Z) - XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
Memory Model [137.50614198301733]
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores.
We develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores.
XMem greatly exceeds state-of-the-art performance on long-video datasets.
arXiv Detail & Related papers (2022-07-14T17:59:37Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - CNN with large memory layers [2.368995563245609]
This work is centred around the recently proposed product key memory structure citelarge_memory, implemented for a number of computer vision applications.
The memory structure can be regarded as a simple computation primitive suitable to be augmented to nearly all neural network architectures.
arXiv Detail & Related papers (2021-01-27T20:58:20Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - Video Object Segmentation with Episodic Graph Memory Networks [198.74780033475724]
A graph memory network is developed to address the novel idea of "learning to update the segmentation model"
We exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges.
The proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks.
arXiv Detail & Related papers (2020-07-14T13:19:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.