Related papers: Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

URL: http://arxiv.org/abs/2410.15859v3
Date: Thu, 24 Oct 2024 10:29:15 GMT
Title: Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Authors: Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma,
Abstract summary: Large language models (LLMs) still suffer from the challenging extrapolation problem. We conduct a theoretical analysis to better understand why No Position. (NoPE) fails outside its effective range, as well as examining the power of Position. (PE) in this context. We introduce a novel weave PE method, MesaExtrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair-Extrapolation to manage the final chunk.
Score: 12.250524667536606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs applicative reach. Our code is available at \url{https://github.com/soacker/Mesa-Extrapolation}.

Related papers

Efficient LLMs with AMP: Attention Heads and MLP Pruning [1.3785656730024138]
We introduce AMP: Attention Heads and Pruning, a novel structured pruning method that efficiently compresses Large Language Models (LLMs) By projecting the input data onto weights, AMP assesses structural importance and overcomes the limitations of existing techniques. AMP surpasses the current state-of-the-art on commonsense reasoning tasks by up to 1.49 percentage points.
arXiv Detail & Related papers (2025-04-29T20:50:08Z)
Privacy-preserved LLM Cascade via CoT-enhanced Policy Learning [14.51198171282123]
Large Language Models (LLMs) have gained significant attention in on-device applications due to their remarkable performance across real-world tasks. We propose a novel Chain-of-Thought (CoT)-enhanced textbfpolicy learning framework for textbfpreserved textbfdeferral decision-making.
arXiv Detail & Related papers (2024-10-10T15:09:52Z)
ReAttention: Training-Free Infinite Context with Finite Attention Scope [65.91272939057592]
Long-context capability of Large Language Models (LLM) has made significant breakthroughs, but the maximum supported context length remains a critical bottleneck limiting their practical applications. We propose bftextReAttention, a training-free approach enabling LLM based on the self-attention mechanism to support an infinite context with a finite attention scope under sufficient memory resources. We validate the performance of ReAttention on the LongBench, L-Eval, and InfiniteBench and demonstrate that it is on par with traditional methods.
arXiv Detail & Related papers (2024-07-21T14:23:37Z)
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models [53.638791265113625]
Sparsity-Preserved efficient fine-tuning method for large language models. Code will be made available at https://github.com/Lucky-Lance/SPP.
arXiv Detail & Related papers (2024-05-25T04:55:27Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional. (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of. LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z)
CLEX: Continuous Length Extrapolation for Large Language Models [68.43814043853347]
We propose Continuous Length EXtrapolation (CLEX) for Large Language Models (LLMs) CLEX extends the context window to over 4x or almost 8x training length, with no deterioration in performance. Our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
arXiv Detail & Related papers (2023-10-25T08:13:02Z)
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models [42.95555008229016]
We propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50% sparsity without the need of any retraining. The advantages of the proposed method exhibit even more when the sparsity is extremely high.
arXiv Detail & Related papers (2023-10-14T05:43:09Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models [12.708117108874083]
Large Language Models (LLMs) generate code snippets given natural language intents in zero-shot, i.e., without the need for specific fine-tuning. Previous research explored In-Context Learning (ICL) as a strategy to guide the LLM generative process with task-specific prompt examples. In this paper, we deliver a comprehensive study of. PEFT techniques for LLMs under the automated code generation scenario.
arXiv Detail & Related papers (2023-08-21T04:31:06Z)
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.