SELF: Self-Extend the Context Length With Logistic Growth Function
- URL: http://arxiv.org/abs/2505.17296v1
- Date: Thu, 22 May 2025 21:23:20 GMT
- Title: SELF: Self-Extend the Context Length With Logistic Growth Function
- Authors: Phat Thanh Dang, Saahil Thoppay, Wang Yang, Qifan Wang, Vipin Chaudhary, Xiaotian Han,
- Abstract summary: We propose SELF: a solution of grouping tokens at varying group sizes using a logistic capacity equation combined with a constant group size at smaller relative distances.<n>Our model had an increase in performance of up to 12% compared to the LongLM extension method in LEval.<n>On reading comprehension tasks from LEval, our model performed up to 5.4% better than the LongLM.
- Score: 24.523942828913405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models suffer issues when operated on long contexts that are larger than their training context length due to the standard position encoding for tokens in the attention layer. Tokens a long distance apart will rarely have an effect on each other and long prompts yield unexpected results. To solve this problem, we propose SELF (Self-Extend the Context Length With Logistic Growth Function): a solution of grouping consecutive tokens at varying group sizes using a logistic capacity equation combined with a constant group size at smaller relative distances. Our model had an increase in performance of up to 12% compared to the LongLM extension method in LEval (specifically on the Qwen model). On summarization related tasks in LongBench, our model performed up to 6.4% better than LongLM (specifically on the Llama-2-7b model). On reading comprehension tasks from LEval, our model performed up to 5.4% better than the LongLM. Our code is available at https://github.com/alexeipc/SELF-LLM.
Related papers
- Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels [3.537369004801589]
We release the Too Long, Didn't Model benchmark.<n>It tests a model's ability to report plot summary, storyworld configuration, and elapsed narrative time.<n>We find that none of seven tested frontier LLMs retain stable understanding beyond 64k tokens.
arXiv Detail & Related papers (2025-05-20T21:21:09Z) - LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation [74.89981179257194]
LongProc (Long Procedural Generation) is a new benchmark for evaluating long-context language models (LCLMs)<n>LongProc consists of six diverse procedural generation tasks, such as extracting structured information from HTML pages into a TSV format and executing complex search procedures to create travel plans.<n>We evaluate 23 LCLMs, including instruction-tuned models and recent reasoning models, on LongProc at three difficulty levels, with the maximum number of output tokens set at 500, 2K, and 8K.
arXiv Detail & Related papers (2025-01-09T18:16:55Z) - ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window.<n>We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens.<n>We find that the performance of strong long-context LLMs using RAG improves when retrieving a larger number of chunks.
arXiv Detail & Related papers (2024-07-19T17:35:47Z) - Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs)
Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities.
We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z) - LongAlign: A Recipe for Long Context Alignment of Large Language Models [61.85923382850057]
LongAlign is a recipe of the instruction data, training, and evaluation for long context alignment.
We construct a long instruction-following dataset using Self-Instruct.
We adopt the packing and sorted strategies to speed up supervised fine-tuning on data with varied length distributions.
arXiv Detail & Related papers (2024-01-31T18:29:39Z) - LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning.
We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information.
We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z) - LongQLoRA: Efficient and Effective Method to Extend Context Length of
Large Language Models [2.4366811507669124]
LongQLoRA is a method to extend context length of large language models with less training resources.
With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps.
LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets.
arXiv Detail & Related papers (2023-11-08T18:33:06Z) - LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [58.20031627237889]
LongBench is the first bilingual, multi-task benchmark for long context understanding.
It comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese)
arXiv Detail & Related papers (2023-08-28T11:53:40Z) - Giraffe: Adventures in Expanding Context Lengths in LLMs [7.8327063299618]
We show that linear scaling is the best method for extending context length.
We also discover promising extrapolation capabilities in the truncated basis.
To support further research in this area, we release three new 13B parameter long-context models.
arXiv Detail & Related papers (2023-08-21T17:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.