In-context Autoencoder for Context Compression in a Large Language Model
- URL: http://arxiv.org/abs/2307.06945v4
- Date: Wed, 8 May 2024 18:16:09 GMT
- Title: In-context Autoencoder for Context Compression in a Large Language Model
- Authors: Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei,
- Abstract summary: We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots.
ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
- Score: 70.7621953091318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, it is fine-tuned on instruction data for producing desirable responses to various prompts. Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves $4\times$ context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management. Our data, code and models are available at https://github.com/getao/icae.
Related papers
- Language Models Encode Collaborative Signals in Recommendation [57.90679739598295]
We show that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance.
Motivated by these findings, we propose a simple yet effective collaborative filtering (CF) model named AlphaRec.
AlphaRec is comprised of three main components: a multilayer perceptron (MLP), graph convolution, and contrastive learning (CL) loss function.
arXiv Detail & Related papers (2024-07-07T17:05:24Z) - Needle in the Haystack for Memory Based Large Language Models [31.885539843977472]
Current large language models (LLMs) often perform poorly on simple fact retrieval tasks.
We investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem.
We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training.
arXiv Detail & Related papers (2024-07-01T16:32:16Z) - AI-native Memory: A Pathway from LLMs Towards AGI [26.697790256031322]
Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI)
We envision a pathway from LLMs to AGI through the integration of emphmemory.
As an intermediate stage, the memory will likely be in the form of natural language descriptions.
arXiv Detail & Related papers (2024-06-26T12:51:37Z) - Soft Prompting for Unlearning in Large Language Models [11.504012974208466]
This work focuses on investigating machine unlearning for Large Language Models motivated by data protection regulations.
We propose a framework textbfSoft textbfPrompting for textbfUntextbflearning (SPUL) that learns prompt tokens that can be appended to an arbitrary query to induce unlearning.
arXiv Detail & Related papers (2024-06-17T19:11:40Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding [9.2433070542025]
Large language models (LLMs) tend to inadequately integrate input context during text generation.
We introduce a novel approach integrating contrastive decoding with adversarial irrelevant passages as negative samples.
arXiv Detail & Related papers (2024-05-04T20:38:41Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [86.91360597169563]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We propose an amortized feature extraction and memory-augmentation approach to compress and extract information from new documents.
Our experiment demonstrates the superiority of MAC in multiple aspects, including online adaptation performance, time, and memory efficiency.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - Graph Neural Prompting with Large Language Models [32.97391910476073]
Graph Neural Prompting (GNP) is a novel plug-and-play method to assist pre-trained language models in learning beneficial knowledge from knowledge graphs.
Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks.
arXiv Detail & Related papers (2023-09-27T06:33:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.