Training With "Paraphrasing the Original Text'' Improves Long-Context Performance
- URL: http://arxiv.org/abs/2312.11193v8
- Date: Thu, 11 Apr 2024 03:29:20 GMT
- Title: Training With "Paraphrasing the Original Text'' Improves Long-Context Performance
- Authors: Yijiong Yu,
- Abstract summary: Large Language Models (LLMs) continue to evolve, more are being designed to handle long-context inputs.
This paper identifies the root of these issues as a deficiency in retrieval capabilities, exacerbated by the sparsity of key information in long contexts.
We introduce a novel approach called Paraphrasing the Original Text'', aimed at augmenting LLMs' proficiency in extracting information from long context.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As Large Language Models (LLMs) continue to evolve, more are being designed to handle long-context inputs. Despite this advancement, many models face challenges in achieving high precision on long-context tasks, often showing a ``lost in the middle'' issue. This paper identifies the root of these issues as a deficiency in retrieval capabilities, exacerbated by the sparsity of key information in long contexts. To tackle this challenge, we introduce a novel approach called ``Paraphrasing the Original Text'', aimed at augmenting LLMs' proficiency in extracting information from long context. This enhancement is achieved through a specialized supervised fine-tuning stage that incorporates paraphrasing information into training samples, thereby improving the model's retrieval capabilities for long-context scenarios. Testing on datasets like LongBench and NaturalQuestions Multi-document QA dataset, our method demonstrated significant improvements in managing long-context tasks, effectively addressing the ``lost in the middle'' dilemma. Specifically, we observed an average performance increase of 6.4\% and 5.9\% across these datasets, respectively. Moreover, our approach is efficient, requiring minimal overhead with fine-tuning needed on just 19k samples. The model and training data have been made available on HuggingFace(https://huggingface.co/yuyijiong/Qwen-14b-chat-yarn-32k).
Related papers
- From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models [54.44375226381814]
Long-context capabilities are essential for a wide range of applications, including document and video understanding, in-context learning, and inference-time scaling.
We introduce a efficient training recipe for building ultra-long context LLMs from aligned instruct model, pushing the boundaries of context lengths from 128K to 1M, 2M, and 4M tokens.
Our approach achieves state-of-the-art performance across a diverse set of long-context benchmarks.
arXiv Detail & Related papers (2025-04-08T16:58:58Z) - Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model.
Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones.
Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z) - Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen.
It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models.
Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z) - How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
ProLong-8B, which is from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z) - Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models [21.90388980448712]
Training models to handle long contexts presents significant challenges.
We introduce Untie the Knots (textbfUtK), a novel data augmentation strategy employed during the continue pre-training phase.
We conduct extensive experiments on models with 7B and 72B parameters, trained on 20 billion tokens, demonstrating that UtK achieves 75% and 84.5% accurracy on RULER at 128K context length.
arXiv Detail & Related papers (2024-09-07T09:28:55Z) - Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning [68.43706033424378]
This study introduces an innovative method designed to increase in-context text length in large language models (MLLMs) efficiently.
We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens.
This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage.
arXiv Detail & Related papers (2024-06-04T17:59:25Z) - LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens.
We develop two novel methods for creating synthetic data.
LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z) - Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models [13.091271774417867]
Long-context modeling capabilities are important for large language models (LLMs) in various applications.
We propose a data mining framework textbfProLong that can assign each training sample with a long dependency score.
Comprehensive experiments on multiple benchmarks indicate that ProLong effectively identifies documents that carry long dependencies.
arXiv Detail & Related papers (2024-05-28T07:36:56Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.