Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
- URL: http://arxiv.org/abs/2410.15859v3
- Date: Thu, 24 Oct 2024 10:29:15 GMT
- Title: Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
- Authors: Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma,
- Abstract summary: Large language models (LLMs) still suffer from the challenging extrapolation problem.
We conduct a theoretical analysis to better understand why No Position.
(NoPE) fails outside its effective range, as well as examining the power of Position.
(PE) in this context.
We introduce a novel weave PE method, MesaExtrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair-Extrapolation to manage the final chunk.
- Score: 12.250524667536606
- License:
- Abstract: Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs applicative reach. Our code is available at \url{https://github.com/soacker/Mesa-Extrapolation}.
Related papers
- ReAttention: Training-Free Infinite Context with Finite Attention Scope [65.91272939057592]
Long-context capability of Large Language Models (LLM) has made significant breakthroughs, but the maximum supported context length remains a critical bottleneck limiting their practical applications.
We propose bftextReAttention, a training-free approach enabling LLM based on the self-attention mechanism to support an infinite context with a finite attention scope under sufficient memory resources.
We validate the performance of ReAttention on the LongBench, L-Eval, and InfiniteBench and demonstrate that it is on par with traditional methods.
arXiv Detail & Related papers (2024-07-21T14:23:37Z) - SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models [53.638791265113625]
Sparsity-Preserved efficient fine-tuning method for large language models.
Code will be made available at https://github.com/Lucky-Lance/SPP.
arXiv Detail & Related papers (2024-05-25T04:55:27Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - Found in the Middle: How Language Models Use Long Contexts Better via
Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional.
(Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of.
LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z) - CLEX: Continuous Length Extrapolation for Large Language Models [68.43814043853347]
We propose Continuous Length EXtrapolation (CLEX) for Large Language Models (LLMs)
CLEX extends the context window to over 4x or almost 8x training length, with no deterioration in performance.
Our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
arXiv Detail & Related papers (2023-10-25T08:13:02Z) - One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models [42.95555008229016]
We propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50% sparsity without the need of any retraining.
The advantages of the proposed method exhibit even more when the sparsity is extremely high.
arXiv Detail & Related papers (2023-10-14T05:43:09Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z) - Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation
with Large Language Models [12.708117108874083]
Large Language Models (LLMs) generate code snippets given natural language intents in zero-shot, i.e., without the need for specific fine-tuning.
Previous research explored In-Context Learning (ICL) as a strategy to guide the LLM generative process with task-specific prompt examples.
In this paper, we deliver a comprehensive study of.
PEFT techniques for LLMs under the automated code generation scenario.
arXiv Detail & Related papers (2023-08-21T04:31:06Z) - Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks.
In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.