HistAlign: Improving Context Dependency in Language Generation by
Aligning with History
- URL: http://arxiv.org/abs/2305.04782v2
- Date: Sun, 3 Dec 2023 19:31:39 GMT
- Title: HistAlign: Improving Context Dependency in Language Generation by
Aligning with History
- Authors: David Wan, Shiyue Zhang, Mohit Bansal
- Abstract summary: Language models (LMs) can generate hallucinations and incoherent outputs, which highlights their weak context dependency.
Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency.
We present HistAlign, a new training approach to ensure good cache alignment.
- Score: 96.35214682008701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) can generate hallucinations and incoherent outputs,
which highlights their weak context dependency. Cache-LMs, which augment LMs
with a memory of recent history, can increase context dependency and have shown
remarkable performance in diverse language generation tasks. However, we find
that even with training, the performance gain stemming from the cache component
of current cache-LMs is suboptimal due to the misalignment between the current
hidden states and those stored in the memory. In this work, we present
HistAlign, a new training approach to ensure good cache alignment such that the
model receives useful signals from the history. We first prove our concept on a
simple and synthetic task where the memory is essential for correct
predictions, and we show that the cache component of HistAlign is better
aligned and improves overall performance. Next, we evaluate HistAlign on
diverse downstream language generation tasks, including prompt continuation,
abstractive summarization, and data-to-text. We demonstrate that HistAlign
improves text coherence and faithfulness in open-ended and conditional
generation settings respectively. HistAlign is also generalizable across
different model families, showcasing its strength in improving context
dependency of LMs in diverse scenarios. Our code is publicly available at
https://github.com/meetdavidwan/histalign
Related papers
- PICASO: Permutation-Invariant Context Composition with State Space Models [98.91198288025117]
State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states.
We propose a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens.
We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.
arXiv Detail & Related papers (2025-02-24T19:48:00Z) - Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption [66.97998742151918]
Large Language Models (LLMs) have revolutionized various industries with their advanced language comprehension.
However, their efficiency is challenged by the Transformer architecture's struggle with handling long texts.
KV Cache has emerged as a pivotal solution, converting the time complexity of token generation from quadratic to linear.
arXiv Detail & Related papers (2024-07-25T12:56:22Z) - HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability.
We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference [47.03691582405274]
Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for generating information.
Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue.
We propose FlashBack, a modular RALM designed to improve the inference efficiency of RALM with appending context pattern.
arXiv Detail & Related papers (2024-05-07T07:14:38Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots.
ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z) - RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit.
Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets.
Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.