Related papers: Structured Packing in LLM Training Improves Long Context Utilization

Structured Packing in LLM Training Improves Long Context Utilization

URL: http://arxiv.org/abs/2312.17296v7
Date: Mon, 24 Jun 2024 16:10:18 GMT
Title: Structured Packing in LLM Training Improves Long Context Utilization
Authors: Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś,
Abstract summary: This study investigates structuring training data to enhance semantic interdependence. We introduce the Structured Packing for Long Context (SPLiCe) method. We validate SPLiCe empirically across models of varying sizes.
Score: 11.484631908171465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in long-context large language models have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. This study investigates structuring training data to enhance semantic interdependence, demonstrating that this approach effectively improves context utilization. To this end, we introduce the Structured Packing for Long Context (SPLiCe) method, which utilizes retrieval to collate mutually relevant documents into long and coherent training examples. We validate SPLiCe empirically across models of varying sizes -- 3B, 7B, and 13B -- achieving improved performance in long-context tasks, such as Qasper and HotpotQA. Remarkably, even brief fine-tuning with SPLiCe is sufficient to realize these benefits. Additionally, SPLiCe effectively mitigates the lost-in-middle phenomenon often observed in large models. Our comprehensive analysis of SPLiCe explores its design choices and reveals intriguing transfer effects; for instance, training on programming code enhances performance on natural language tasks.

Related papers

CCF: A Context Compression Framework for Efficient Long-Sequence Language Modeling [52.05149789178508]
CCF is a novel context compression framework designed to enable efficient long-context modeling.<n>CCF integrates segment-wise semantic aggregation with key-value memory encoding, forming compact representations.<n> Empirical results on multiple long-context language modeling benchmarks demonstrate that CCF achieves competitive perplexity under high compression ratios.
arXiv Detail & Related papers (2025-09-11T07:13:49Z)
Long-Short Alignment for Effective Long-Context Modeling in LLMs [32.13785291956956]
Large language models (LLMs) have exhibited impressive performance and surprising emergent properties.<n>Length generalization -- the ability to generalize to sequences longer than those seen during training -- is a classical and fundamental problem.<n>We highlight the critical role of textbflong-short alignment -- the consistency of output distributions across sequences of varying lengths.
arXiv Detail & Related papers (2025-06-13T13:25:39Z)
Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning [55.41828729623907]
We present Writing-RL: an Adaptive Curriculum Reinforcement Learning framework to advance long-form writing capabilities beyond supervised fine-tuning.<n>The framework consists of three key components: Margin-aware Data Selection strategy that prioritizes samples with high learning potential, Pairwise Comparison Reward mechanism that provides discriminative learning signals, and Dynamic Reference Scheduling approach.
arXiv Detail & Related papers (2025-06-06T05:40:39Z)
Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model. Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones. Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z)
SEAL: Scaling to Emphasize Attention for Long-Context Retrieval [9.446971590056945]
We introduce a novel approach called Scaling to Emphasize Attention for Long-context retrieval (SEAL) It enhances the retrieval performance of large language models (LLMs) over extended contexts.
arXiv Detail & Related papers (2025-01-25T14:09:39Z)
Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information. During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z)
What is Wrong with Perplexity for Long-context Language Modeling? [71.34933096461124]
Long-context inputs are crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. Perplexity (PPL) has proven unreliable for assessing long-context capabilities. We propose bfLongPPL, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them.
arXiv Detail & Related papers (2024-10-31T09:39:28Z)
A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z)
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models [21.90388980448712]
Training models to handle long contexts presents significant challenges. We introduce Untie the Knots (textbfUtK), a novel data augmentation strategy employed during the continue pre-training phase. We conduct extensive experiments on models with 7B and 72B parameters, trained on 20 billion tokens, demonstrating that UtK achieves 75% and 84.5% accurracy on RULER at 128K context length.
arXiv Detail & Related papers (2024-09-07T09:28:55Z)
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs) This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z)
CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs) We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z)
Long Context Alignment with Short Instructions and Synthesized Positions [56.1267385315404]
This paper introduces Step-Skipping Alignment (SkipAlign) It is a new technique designed to enhance the long-context capabilities of Large Language Models (LLMs) With a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
arXiv Detail & Related papers (2024-05-07T01:56:22Z)
Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.