Related papers: FocusLLM: Scaling LLM's Context by Parallel Decoding

FocusLLM: Scaling LLM's Context by Parallel Decoding

URL: http://arxiv.org/abs/2408.11745v1
Date: Wed, 21 Aug 2024 16:11:59 GMT
Title: FocusLLM: Scaling LLM's Context by Parallel Decoding
Authors: Zhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan, Junjie Fang, Rong Han, Zixuan Wang, Jianyong Wang,
Abstract summary: FocusLLM is a framework designed to extend the context length of any decoder-only LLM. FocusLLM processes long text inputs by dividing them into chunks based on the model's original context length. It appends the local context to each chunk as a prompt to extract essential information from each chunk based on a novel parallel decoding mechanism.
Score: 16.642675785000176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications. However, achieving long context lengths with the conventional transformer architecture requires substantial training and inference resources. In this paper, we present FocusLLM, a framework designed to extend the context length of any decoder-only LLM, enabling the model to focus on relevant information from very long sequences. FocusLLM processes long text inputs by dividing them into chunks based on the model's original context length to alleviate the issue of attention distraction. Then, it appends the local context to each chunk as a prompt to extract essential information from each chunk based on a novel parallel decoding mechanism, and ultimately integrates the extracted information into the local context. FocusLLM stands out for great training efficiency and versatility: trained with an 8K input length with much less training cost than previous methods, FocusLLM exhibits superior performance across downstream long-context tasks and maintains strong language modeling ability when handling extensive long texts, even up to 400K tokens. Our code is available at https://github.com/leezythu/FocusLLM.

Related papers

From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models [54.44375226381814]
Long-context capabilities are essential for a wide range of applications, including document and video understanding, in-context learning, and inference-time scaling. We introduce a efficient training recipe for building ultra-long context LLMs from aligned instruct model, pushing the boundaries of context lengths from 128K to 1M, 2M, and 4M tokens. Our approach achieves state-of-the-art performance across a diverse set of long-context benchmarks.
arXiv Detail & Related papers (2025-04-08T16:58:58Z)
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training [51.23520027773028]
Extending context window sizes allows large language models to process longer sequences and handle more complex tasks. We observe that using RoPE with BFloat16 format results in numerical issues, causing it to deviate from its intended relative positional encoding. We develop AnchorAttention, a plug-and-play attention method that alleviates numerical issues caused by BFloat16.
arXiv Detail & Related papers (2024-11-20T17:22:31Z)
Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information. During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z)
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models [72.71150585370147]
LongRecipe is an efficient training strategy for extending the context window of large language models. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training.
arXiv Detail & Related papers (2024-08-31T17:19:30Z)
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? [37.64593022203498]
NeedleBench is a framework consisting of progressively more challenging tasks for assessing bilingual long-context capabilities. We use the framework to assess how well the leading open-source models can identify key information relevant to the question. We propose the Ancestral Trace Challenge to mimic the complexity of logical reasoning challenges that are likely to be present in real-world long-context tasks.
arXiv Detail & Related papers (2024-07-16T17:59:06Z)
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs) This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z)
Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens [21.61634020256455]
Transformer-based large language models (LLMs) suffer a performance degradation when modeling long-term contexts. We propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks.
arXiv Detail & Related papers (2024-06-16T15:50:10Z)
From Text to Pixel: Advancing Long-Context Understanding in MLLMs [70.78454154014989]
We introduce SEEKER, a multimodal large language model designed to tackle this issue. SEEKER aims to optimize the compact encoding of long text by compressing the text sequence into the visual pixel space via images. Our experiments on six long-context multimodal tasks demonstrate that SEEKER can leverage fewer image tokens to convey the same amount of textual information compared with the OCR-based approach.
arXiv Detail & Related papers (2024-05-23T06:17:23Z)
Make Your LLM Fully Utilize the Context [70.89099306100155]
We show that FILM-7B can robustly retrieve information from different positions in its 32K context window. FILM-7B significantly improves the performance on real-world long-context tasks.
arXiv Detail & Related papers (2024-04-25T17:55:14Z)
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding [78.36702055076456]
This paper introduces Multi-scale Positional. (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of. LLMs to handle relevant information located in the middle of the context.
arXiv Detail & Related papers (2024-03-05T04:58:37Z)
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding [58.20031627237889]
LongBench is the first bilingual, multi-task benchmark for long context understanding. It comprises 21 datasets across 6 task categories in both English and Chinese, with an average length of 6,711 words (English) and 13,386 characters (Chinese)
arXiv Detail & Related papers (2023-08-28T11:53:40Z)
Focused Transformer: Contrastive Training for Context Scaling [31.44508996359732]
We introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. FoT enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context.
arXiv Detail & Related papers (2023-07-06T17:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.