Found in the Middle: How Language Models Use Long Contexts Better via
Plug-and-Play Positional Encoding
- URL: http://arxiv.org/abs/2403.04797v1
- Date: Tue, 5 Mar 2024 04:58:37 GMT
- Title: Found in the Middle: How Language Models Use Long Contexts Better via
Plug-and-Play Positional Encoding
- Authors: Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase,
Beidi Chen, Xiaoxia Wu, Zhangyang Wang
- Abstract summary: This paper introduces Multi-scale Positional.
(Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of.
LLMs to handle relevant information located in the middle of the context.
- Score: 78.36702055076456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to overcome the "lost-in-the-middle" challenge of large
language models (LLMs). While recent advancements have successfully enabled
LLMs to perform stable language modeling with up to 4 million tokens, the
persistent difficulty faced by most LLMs in identifying relevant information
situated in the middle of the context has not been adequately tackled. To
address this problem, this paper introduces Multi-scale Positional Encoding
(Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the
capacity of LLMs to handle the relevant information located in the middle of
the context, without fine-tuning or introducing any additional overhead. Ms-PoE
leverages the position indice rescaling to relieve the long-term decay effect
introduced by RoPE, while meticulously assigning distinct scaling ratios to
different attention heads to preserve essential knowledge learned during the
pre-training step, forming a multi-scale context fusion from short to long
distance. Extensive experiments with a wide range of LLMs demonstrate the
efficacy of our approach. Notably, Ms-PoE achieves an average accuracy gain of
up to 3.8 on the Zero-SCROLLS benchmark over the original LLMs. Code are
available at https://github.com/VITA-Group/Ms-PoE.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Small Language Model Is a Good Guide for Large Language Model in Chinese
Entity Relation Extraction [13.344709924683471]
In this paper, we propose SLCoLM, a model collaboration framework, to mitigate the data long-tail problem.
We use the textitTraining-Guide-Predict'' strategy to combine the strengths of pre-trained language models (PLMs) and large language models (LLMs)
Our experiments on a RE dataset rich in relation types show that the approach in this paper facilitates RE of long-tail relation types.
arXiv Detail & Related papers (2024-02-22T08:26:56Z) - Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion [70.9767518332692]
Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.
However, they fall short to comprehend context involving multiple images.
We propose a two phase paradigm, browse-and-concentrate, to enable in-depth multimodal context fusion.
arXiv Detail & Related papers (2024-02-19T14:59:07Z) - Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window.
Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE)
We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - InfMLLM: A Unified Framework for Visual-Language Tasks [44.29407348046122]
multimodal large language models (MLLMs) have attracted growing interest.
This work delves into enabling LLMs to tackle more vision-language-related tasks.
InfMLLM achieves either state-of-the-art (SOTA) performance or performance comparable to recent MLLMs.
arXiv Detail & Related papers (2023-11-12T09:58:16Z) - LooGLE: Can Long-Context Language Models Understand Long Contexts? [50.408957515411096]
LooGLE is a benchmark for large language models' long context understanding.
It features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains.
The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings.
arXiv Detail & Related papers (2023-11-08T01:45:37Z) - Generative Multimodal Entity Linking [24.322540112710918]
Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to referent entities from a knowledge base.
Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters.
We propose GEMEL, a Generative Multimodal Entity Linking framework based on Large Language Models (LLMs)
Our framework is compatible with any off-the-shelf language model, paving the way towards an efficient and general solution.
arXiv Detail & Related papers (2023-06-22T07:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.