Related papers: SirLLM: Streaming Infinite Retentive LLM

SirLLM: Streaming Infinite Retentive LLM

URL: http://arxiv.org/abs/2405.12528v1
Date: Tue, 21 May 2024 06:37:03 GMT
Title: SirLLM: Streaming Infinite Retentive LLM
Authors: Yao Yao, Zuchao Li, Hai Zhao,
Abstract summary: Large Language Models (LLMs) process inputs of any length and maintain a degree of memory. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs. We introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues.
Score: 74.40196814292426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Large Language Models (LLMs) become increasingly prevalent in various domains, their ability to process inputs of any length and maintain a degree of memory becomes essential. However, the one-off input of overly long texts is limited, as studies have shown that when input lengths exceed the LLMs' pre-trained text length, there is a dramatic decline in text generation capabilities. Moreover, simply extending the length of pre-training texts is impractical due to the difficulty in obtaining long text data and the substantial memory consumption costs this would entail for LLMs. Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model's long-term memory capabilities. Motivated by this challenge, we introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues without the need for fine-tuning. SirLLM utilizes the Token Entropy metric and a memory decay mechanism to filter key phrases, endowing LLMs with both long-lasting and flexible memory. We designed three distinct tasks and constructed three datasets to measure the effectiveness of SirLLM from various angles: (1) DailyDialog; (2) Grocery Shopping; (3) Rock-Paper-Scissors. Our experimental results robustly demonstrate that SirLLM can achieve stable and significant improvements across different LLMs and tasks, compellingly proving its effectiveness. When having a coversation, "A sir could forget himself," but SirLLM never does! Our code is publicly available at https://github.com/Zoeyyao27/SirLLM

Related papers

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU [48.105361428245736]
We introduce InfiniteHiP, an inference framework for large language models (LLMs) We dynamically eliminate irrelevant context tokens through a modular hierarchical token pruning algorithm. Our framework achieves an 18.95x speedup in attention decoding for a 1 million token context without requiring additional training.
arXiv Detail & Related papers (2025-02-13T02:52:01Z)
Length Controlled Generation for Black-box LLMs [70.57649832433451]
Large language models (LLMs) have demonstrated impressive instruction following capabilities, but struggle to accurately manage the length of generated text. We propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. Our framework achieves almost 100% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization.
arXiv Detail & Related papers (2024-12-19T09:07:38Z)
Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly [6.685692482347038]
Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs. This paper uncovers a surprising limitation: LLMs fall short when handling long input sequences. We propose and evaluate ad-hoc solutions that substantially enhance LLMs' performance on long input sequences by up to 50%.
arXiv Detail & Related papers (2024-08-03T21:31:34Z)
Needle in the Haystack for Memory Based Large Language Models [31.885539843977472]
Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. We investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training.
arXiv Detail & Related papers (2024-07-01T16:32:16Z)
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module. Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z)
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM. InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z)
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures. This work identifies three major factors contributing to this length generalization failure. We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z)
Augmenting Language Models with Long-Term Memory [142.04940250657637]
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit. We propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history.
arXiv Detail & Related papers (2023-06-12T15:13:39Z)
Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information. We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.