HistAlign: Improving Context Dependency in Language Generation by
Aligning with History
- URL: http://arxiv.org/abs/2305.04782v2
- Date: Sun, 3 Dec 2023 19:31:39 GMT
- Title: HistAlign: Improving Context Dependency in Language Generation by
Aligning with History
- Authors: David Wan, Shiyue Zhang, Mohit Bansal
- Abstract summary: Language models (LMs) can generate hallucinations and incoherent outputs, which highlights their weak context dependency.
Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency.
We present HistAlign, a new training approach to ensure good cache alignment.
- Score: 96.35214682008701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) can generate hallucinations and incoherent outputs,
which highlights their weak context dependency. Cache-LMs, which augment LMs
with a memory of recent history, can increase context dependency and have shown
remarkable performance in diverse language generation tasks. However, we find
that even with training, the performance gain stemming from the cache component
of current cache-LMs is suboptimal due to the misalignment between the current
hidden states and those stored in the memory. In this work, we present
HistAlign, a new training approach to ensure good cache alignment such that the
model receives useful signals from the history. We first prove our concept on a
simple and synthetic task where the memory is essential for correct
predictions, and we show that the cache component of HistAlign is better
aligned and improves overall performance. Next, we evaluate HistAlign on
diverse downstream language generation tasks, including prompt continuation,
abstractive summarization, and data-to-text. We demonstrate that HistAlign
improves text coherence and faithfulness in open-ended and conditional
generation settings respectively. HistAlign is also generalizable across
different model families, showcasing its strength in improving context
dependency of LMs in diverse scenarios. Our code is publicly available at
https://github.com/meetdavidwan/histalign
Related papers
- Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models.
We propose an importance-driven cache merging strategy to prune redundancy caches.
For instruction encoding, we utilize the frequency to evaluate the importance of caches.
Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z) - HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability.
We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference [47.03691582405274]
Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for generating information.
Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue.
We propose FlashBack, a modular RALM designed to improve the inference efficiency of RALM with appending context pattern.
arXiv Detail & Related papers (2024-05-07T07:14:38Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots.
ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z) - RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit.
Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets.
Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.