Recurrent Relational Memory Network for Unsupervised Image Captioning
- URL: http://arxiv.org/abs/2006.13611v1
- Date: Wed, 24 Jun 2020 10:44:35 GMT
- Title: Recurrent Relational Memory Network for Unsupervised Image Captioning
- Authors: Dan Guo, Yang Wang, Peipei Song, Meng Wang
- Abstract summary: Unsupervised image captioning with no annotations is a challenge in computer vision.
In this paper, we propose a novel memory-based network rather than an emerging GAN model.
Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods.
- Score: 26.802700428311745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised image captioning with no annotations is an emerging challenge in
computer vision, where the existing arts usually adopt GAN (Generative
Adversarial Networks) models. In this paper, we propose a novel memory-based
network rather than GAN, named Recurrent Relational Memory Network ($R^2M$).
Unlike complicated and sensitive adversarial learning that non-ideally performs
for long sentence generation, $R^2M$ implements a concepts-to-sentence memory
translator through two-stage memory mechanisms: fusion and recurrent memories,
correlating the relational reasoning between common visual concepts and the
generated words for long periods. $R^2M$ encodes visual context through
unsupervised training on images, while enabling the memory to learn from
irrelevant textual corpus via supervised fashion. Our solution enjoys less
learnable parameters and higher computational efficiency than GAN-based
methods, which heavily bear parameter sensitivity. We experimentally validate
the superiority of $R^2M$ than state-of-the-arts on all benchmark datasets.
Related papers
- $\text{Memory}^3$: Language Modeling with Explicit Memory [22.572376536612015]
We equip large language models (LLMs) with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG)
As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs and RAG models.
We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable.
arXiv Detail & Related papers (2024-07-01T11:07:23Z) - In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots.
ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z) - Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal
Sentence Localization in Videos [67.12603318660689]
We propose a novel Hierarchical Visual- and Semantic-Aware Reasoning Network (HVSARN)
HVSARN enables both visual- and semantic-aware query reasoning from object-level to frame-level.
Experiments on three datasets demonstrate that our HVSARN achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2023-03-02T08:00:22Z) - Pre-computed memory or on-the-fly encoding? A hybrid approach to
retrieval augmentation makes the most of your compute [23.85786594315147]
Fusion-in-Decoders are powerful, setting the state of the art on a variety of knowledge-intensive tasks.
Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly.
We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly.
We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget.
arXiv Detail & Related papers (2023-01-25T07:55:45Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - A model of semantic completion in generative episodic memory [0.6690874707758508]
We propose a computational model for generative episodic memory.
The model is able to complete missing parts of a memory trace in a semantically plausible way.
We also model an episodic memory experiment and can reproduce that semantically congruent contexts are always recalled better than incongruent ones.
arXiv Detail & Related papers (2021-11-26T15:14:17Z) - Kanerva++: extending The Kanerva Machine with differentiable, locally
block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model.
We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory.
We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z) - Distributed Associative Memory Network with Memory Refreshing Loss [5.5792083698526405]
We introduce a novel Distributed Associative Memory architecture (DAM) with Memory Refreshing Loss (MRL)
Inspired by how the human brain works, our framework encodes data with distributed representation across multiple memory blocks.
MRL enables MANN to reinforce an association between input data and task objective by reproducing input data from stored memory contents.
arXiv Detail & Related papers (2020-07-21T07:34:33Z) - IMRAM: Iterative Matching with Recurrent Attention Memory for
Cross-Modal Image-Text Retrieval [105.77562776008459]
Existing methods leverage the attention mechanism to explore such correspondence in a fine-grained manner.
It may be difficult to optimally capture such sophisticated correspondences in existing methods.
We propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences are captured with multiple steps of alignments.
arXiv Detail & Related papers (2020-03-08T12:24:41Z) - Self-Attentive Associative Memory [69.40038844695917]
We propose to separate the storage of individual experiences (item memory) and their occurring relationships (relational memory)
We achieve competitive results with our proposed two-memory model in a diversity of machine learning tasks.
arXiv Detail & Related papers (2020-02-10T03:27:48Z) - MEMO: A Deep Network for Flexible Combination of Episodic Memories [16.362284088767456]
MEMO is an architecture endowed with the capacity to reason over longer distances.
First, it introduces a separation between memories stored in external memory and the items that comprise these facts in external memory.
Second, it makes use of an adaptive retrieval mechanism, allowing a variable number of "memory hops" before the answer is produced.
arXiv Detail & Related papers (2020-01-29T15:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.