Related papers: HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing

URL: http://arxiv.org/abs/2405.06067v3
Date: Thu, 06 Feb 2025 17:35:02 GMT
Title: HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing
Authors: Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong,
Abstract summary: Hierarchical Memory Transformer (HMT) is a novel framework that facilitates a model's long-context processing ability.<n>HMT consistently improves the long-context processing ability of existing models.
Score: 33.720656946186885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in previous works can memorize past tokens to enable unlimited context and maintain effectiveness, they have ``flat'' memory architectures. Such architectures have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we believe that imitating brain memory hierarchy is beneficial for model memorization. Thus, we propose the Hierarchical Memory Transformer (HMT), a novel framework that facilitates a model's long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling, question-answering tasks, and the summarization task, we show that HMT consistently improves the long-context processing ability of existing models. Furthermore, HMT achieves a comparable or superior generation quality to long-context LLMs with $2 \sim 57\times$ fewer parameters and $2.5 \sim 116\times$ less inference memory, significantly outperforming previous memory-augmented models. Code on Github: https://github.com/OswaldHe/HMT-pytorch.

Related papers

Titans: Learning to Memorize at Test Time [20.12643072017223]
We present a new neural long-term memory module that learns to memorize historical context. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. We introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture.
arXiv Detail & Related papers (2024-12-31T22:32:03Z)
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module. B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z)
Needle in the Haystack for Memory Based Large Language Models [31.885539843977472]
Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. We investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training.
arXiv Detail & Related papers (2024-07-01T16:32:16Z)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models. HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z)
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses. We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z)
Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit. We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z)
Extending Memory for Language Modelling [0.0]
We introduce Long Term Memory network (LTM) to learn from infinitely long sequences. LTM gives priority to the current inputs to allow it to have a high impact. We compare LTM with other language models which require long term memory.
arXiv Detail & Related papers (2023-05-19T06:30:19Z)
Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information. We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z)
LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens. LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.