Related papers: Exploring Context Window of Large Language Models via Decomposed Positional Vectors

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

URL: http://arxiv.org/abs/2405.18009v1
Date: Tue, 28 May 2024 09:50:46 GMT
Title: Exploring Context Window of Large Language Models via Decomposed Positional Vectors
Authors: Zican Dong, Junyi Li, Xin Men, Wayne Xin Zhao, Bingbing Wang, Zhen Tian, Weipeng Chen, Ji-Rong Wen,
Abstract summary: Transformer-based large language models (LLMs) typically have a limited context window. In this study, we explore the positional information within and beyond the context window.
Score: 107.19556541244654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. By using a mean-based decomposition method, we disentangle positional vectors from hidden states of LLMs and analyze their formation and effect on attention. Furthermore, when texts exceed the context window, we analyze the change of positional vectors in two settings, i.e., direct extrapolation and context window extension. Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension. Experimental results show that our methods can effectively extend the context window length.

Related papers

Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation [20.415410280412697]
We propose an Adaptive Inner Speech-Text Alignment (AI-STA) method to bridge the modality gap by explicitly aligning speech and text representations at selected layers within large language models (LLMs) Experimental results on speech translation tasks demonstrate that AI-STA significantly improves the translation performance of large speech-text models (LSMs), outperforming previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-13T09:54:35Z)
Context-DPO: Aligning Language Models for Context-Faithfulness [80.62221491884353]
We propose the first alignment method specifically designed to enhance large language models' context-faithfulness. By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization. Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models.
arXiv Detail & Related papers (2024-12-18T04:08:18Z)
Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models [62.698520962933195]
Large Vision-Language Models (LVLMs) excel in cross-model tasks but experience performance declines in long-context reasoning. We propose a novel training-free context pruning method that selectively removes less critical textual information.
arXiv Detail & Related papers (2024-10-25T17:59:09Z)
Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data. We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders. In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z)
Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding [11.5386284281652]
We introduce a novel approach that re-imagines information retrieval through dynamic in-context editing. By treating lengthy contexts as malleable external knowledge, our method interactively gathers and integrates relevant information. Experimental results demonstrate that our method effectively empowers context-limited LLMs to engage in multi-hop reasoning with improved performance.
arXiv Detail & Related papers (2024-06-18T06:54:28Z)
Program Decomposition and Translation with Static Analysis [0.0]
We assess the effect of method-level program decomposition on context window of Large Language Models (LLMs) We investigate how this approach can enable translation of very large files which originally could not be done due to out-of-context issue.
arXiv Detail & Related papers (2024-01-22T23:49:32Z)
Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window. Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE) We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z)
Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information. We propose an approach to distill the generated information during fine-tuning of self-supervised speech models. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z)
Retrieval meets Long Context Large Language Models [59.431200671427064]
Extending context window of large language models (LLMs) is getting popular recently. Retrieval-augmentation versus long context window, which one is better for downstream tasks? Can both methods be combined to get the best of both worlds? Our best model, retrieval-augmented Llama2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on nine long context tasks.
arXiv Detail & Related papers (2023-10-04T17:59:41Z)
Parallel Context Windows for Large Language Models [52.965170346907904]
We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents.
arXiv Detail & Related papers (2022-12-21T11:38:51Z)
Revisiting the Context Window for Cross-lingual Word Embeddings [32.27333420000134]
Existing approaches to mapping-based cross-lingual word embeddings are based on the assumption that the source and target embedding spaces are structurally similar. This work provides a thorough evaluation, in various languages, domains, and tasks, of bilingual embeddings trained with different context windows. The highlight of our findings is that increasing the size of both the source and target window sizes improves the performance of bilingual lexicon induction.
arXiv Detail & Related papers (2020-04-22T19:29:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.