Related papers: Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

URL: http://arxiv.org/abs/2403.04797v1
Date: Tue, 5 Mar 2024 04:58:37 GMT
Title: Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
Authors: Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang
Abstract summary: This paper introduces Multi-scale Positional. (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of. LLMs to handle relevant information located in the middle of the context.
Score: 78.36702055076456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper aims to overcome the "lost-in-the-middle" challenge of large language models (LLMs). While recent advancements have successfully enabled LLMs to perform stable language modeling with up to 4 million tokens, the persistent difficulty faced by most LLMs in identifying relevant information situated in the middle of the context has not been adequately tackled. To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead. Ms-PoE leverages the position indice rescaling to relieve the long-term decay effect introduced by RoPE, while meticulously assigning distinct scaling ratios to different attention heads to preserve essential knowledge learned during the pre-training step, forming a multi-scale context fusion from short to long distance. Extensive experiments with a wide range of LLMs demonstrate the efficacy of our approach. Notably, Ms-PoE achieves an average accuracy gain of up to 3.8 on the Zero-SCROLLS benchmark over the original LLMs. Code are available at https://github.com/VITA-Group/Ms-PoE.

Related papers

Protecting multimodal large language models against misleading visualizations [94.71976205962527]
We introduce the first inference-time methods to improve performance on misleading visualizations. We find that MLLM question-answering accuracy drops on average to the level of a random baseline.
arXiv Detail & Related papers (2025-02-27T20:22:34Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Large Language Models Can Self-Improve in Long-context Reasoning [100.52886241070907]
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. We propose ours, an approach specifically designed for this purpose. ours achieves superior performance compared to prior approaches that depend on data produced by human experts or advanced models.
arXiv Detail & Related papers (2024-11-12T19:53:00Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language. Currently, instruction-tuned large language models (LLMs) excel at various English tasks. Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding [32.197113821638936]
We propose a novel integrated Long-Context Large Language Model (FltLM) FltLM incorporates a context filter with a soft mask mechanism, identifying and dynamically excluding irrelevant content to concentrate on pertinent information. Experimental results demonstrate that FltLM significantly outperforms supervised fine-tuning and retrieval-based methods in complex QA scenarios.
arXiv Detail & Related papers (2024-10-09T13:47:50Z)
Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization [0.27624021966289597]
This paper introduces EYEGLAXS, a framework that leverages Large Language Models (LLMs) for extractive summarization. EYEGLAXS focuses on extractive summarization to ensure factual and grammatical integrity. The system sets new performance benchmarks on well-known datasets like PubMed and ArXiv.
arXiv Detail & Related papers (2024-08-28T13:52:19Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
Small Language Model Is a Good Guide for Large Language Model in Chinese Entity Relation Extraction [13.344709924683471]
In this paper, we propose SLCoLM, a model collaboration framework, to mitigate the data long-tail problem. We use the textitTraining-Guide-Predict'' strategy to combine the strengths of pre-trained language models (PLMs) and large language models (LLMs) Our experiments on a RE dataset rich in relation types show that the approach in this paper facilitates RE of long-tail relation types.
arXiv Detail & Related papers (2024-02-22T08:26:56Z)
Extending LLMs' Context Window with 100 Samples [42.52554295241792]
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window. Recent studies have sought to extend the context window by modifying rotary position embedding (RoPE) We introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window.
arXiv Detail & Related papers (2024-01-13T07:57:01Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
Generative Multimodal Entity Linking [24.322540112710918]
Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to referent entities from a knowledge base. Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters. We propose GEMEL, a Generative Multimodal Entity Linking framework based on Large Language Models (LLMs) Our framework is compatible with any off-the-shelf language model, paving the way towards an efficient and general solution.
arXiv Detail & Related papers (2023-06-22T07:57:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.