Extending Memory for Language Modelling
- URL: http://arxiv.org/abs/2305.11462v1
- Date: Fri, 19 May 2023 06:30:19 GMT
- Title: Extending Memory for Language Modelling
- Authors: Anupiya Nugaliyadde
- Abstract summary: We introduce Long Term Memory network (LTM) to learn from infinitely long sequences.
LTM gives priority to the current inputs to allow it to have a high impact.
We compare LTM with other language models which require long term memory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Breakthroughs in deep learning and memory networks have made major advances
in natural language understanding. Language is sequential and information
carried through the sequence can be captured through memory networks. Learning
the sequence is one of the key aspects in learning the language. However,
memory networks are not capable of holding infinitely long sequences in their
memories and are limited by various constraints such as the vanishing or
exploding gradient problem. Therefore, natural language understanding models
are affected when presented with long sequential text. We introduce Long Term
Memory network (LTM) to learn from infinitely long sequences. LTM gives
priority to the current inputs to allow it to have a high impact. Language
modeling is an important factor in natural language understanding. LTM was
tested in language modeling, which requires long term memory. LTM is tested on
Penn Tree bank dataset, Google Billion Word dataset and WikiText-2 dataset. We
compare LTM with other language models which require long term memory.
Related papers
- MemLong: Memory-Augmented Retrieval for Long Text Modeling [37.49036666949963]
This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation.
MemLong combines a non-differentiable ret-mem'' module with a partially trainable decoder-only language model.
Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs.
arXiv Detail & Related papers (2024-08-30T02:01:56Z) - SirLLM: Streaming Infinite Retentive LLM [74.40196814292426]
Large Language Models (LLMs) process inputs of any length and maintain a degree of memory.
Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs.
We introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues.
arXiv Detail & Related papers (2024-05-21T06:37:03Z) - HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability.
We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - Aspects of human memory and Large Language Models [0.0]
Large Language Models (LLMs) are huge artificial neural networks which primarily serve to generate text.
We find surprising similarities with key characteristics of human memory.
arXiv Detail & Related papers (2023-11-07T09:39:12Z) - Recursively Summarizing Enables Long-Term Dialogue Memory in Large
Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.
We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z) - Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit.
We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Neural Machine Translation with Monolingual Translation Memory [58.98657907678992]
We propose a new framework that uses monolingual memory and performs learnable memory retrieval in a cross-lingual manner.
Experiments show that the proposed method obtains substantial improvements.
arXiv Detail & Related papers (2021-05-24T13:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.