Adapting Language Models to Compress Contexts
- URL: http://arxiv.org/abs/2305.14788v2
- Date: Sat, 4 Nov 2023 04:09:43 GMT
- Title: Adapting Language Models to Compress Contexts
- Authors: Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen
- Abstract summary: Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window.
We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors.
We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
- Score: 71.98287002918941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based language models (LMs) are powerful and widely-applicable
tools, but their usefulness is constrained by a finite context window and the
expensive computational cost of processing long text documents. We propose to
adapt pre-trained LMs into AutoCompressors. These language models are capable
of compressing long contexts into compact summary vectors, which are then
accessible to the model as soft prompts. Summary vectors are trained with an
unsupervised objective, whereby long documents are processed in segments, and
summary vectors from all previous segments are used in language modeling. We
fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show
that AutoCompressors can utilize long contexts to improve perplexity. We
evaluate AutoCompressors on in-context learning by compressing task
demonstrations and find that summary vectors are good substitutes for
plain-text demonstrations, increasing accuracy while reducing inference costs.
Finally, we explore the benefits of pre-computing summary vectors for large
corpora by applying summary vectors to retrievalaugmented language modeling and
a passage re-ranking task. Overall, AutoCompressors emerge as a simple and
inexpensive solution to extend the context window of LMs while speeding up
inference over long contexts.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - CEV-LM: Controlled Edit Vector Language Model for Shaping Natural
Language Generations [5.148810760938979]
We introduce CEV-LM - a lightweight, semi-autoregressive language model that utilizes constrained edit vectors to control three complementary metrics.
We study an extensive set of state-of-the-art CTG models and find that CEV-LM provides significantly more targeted and precise control of these three metrics.
arXiv Detail & Related papers (2024-02-22T05:07:31Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling [31.745255364708864]
We introduce lightweight adapters to the pre-trained model and only update them at fine-tuning time.
We propose a novel REcurrent ADapter (READ) that employs recurrent computation to enable temporal modeling capability.
We validate our READ framework through extensive experiments where READ significantly outperforms all existing fine-tuning strategies.
arXiv Detail & Related papers (2023-12-12T03:09:30Z) - RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective
Augmentation [61.53695868960846]
We propose compressing retrieved documents into textual summaries prior to in-context integration.
This not only reduces the computational costs but also relieves the burden of LMs to identify relevant information in long retrieved documents.
We show that our compressors trained for one LM can transfer to other LMs on the language modeling task and provide summaries largely faithful to the retrieved documents.
arXiv Detail & Related papers (2023-10-06T17:55:36Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Language Models Implement Simple Word2Vec-style Vector Arithmetic [32.2976613483151]
A primary criticism towards language models (LMs) is their inscrutability.
This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks.
arXiv Detail & Related papers (2023-05-25T15:04:01Z) - Dense Sparse Retrieval: Using Sparse Language Models for Inference
Efficient Dense Retrieval [37.22592489907125]
We study how sparse language models can be used for dense retrieval to improve inference efficiency.
We find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds.
arXiv Detail & Related papers (2023-03-31T20:21:32Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.