Related papers: P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training

P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training

URL: http://arxiv.org/abs/2408.05541v2
Date: Fri, 18 Oct 2024 04:52:38 GMT
Title: P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training
Authors: Yingxuan Yang, Huayi Wang, Muning Wen, Xiaoyun Mo, Qiuying Peng, Jun Wang, Weinan Zhang,
Abstract summary: This paper introduces P3, an adaptive framework aimed at optimizing the task-specific fine-tuning process through iterative data pruning. P3 consists of three key components: Policy-driven Difficulty Measurement, Pace-Adaptive Selection, and Diversity Promotion. We validate P3 on the reasoning scenarios, APPS and MATH, demonstrating significant improvements over traditional data pruning methods.
Score: 22.61313628957683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the rapidly advancing field of Large Language Models (LLMs), effectively leveraging existing datasets during fine-tuning to maximize the model's potential is of paramount importance. This paper introduces P3, an adaptive framework aimed at optimizing the task-specific fine-tuning process through iterative data pruning. P3 consists of three key components: (1) Policy-driven Difficulty Measurement, which dynamically assesses data difficulty based on the model's real-time performance, replacing static metrics with adaptable evaluations; (2) Pace-Adaptive Selection, leveraging self-paced learning to progressively introduce more challenging data, thereby enhancing model capability; (3) Diversity Promotion, incorporating Determinantal Point Process (DPP) to ensure data diversity across epochs, enriching the learning process. We validate P3 on the reasoning scenarios, APPS and MATH, demonstrating significant improvements over traditional data pruning methods. By advancing dynamic data selection and utilization strategies, P3 contributes both a theoretical framework and concrete approach to fully exploit existing data for LLMs' performance improvement, offering utility across diverse tasks.

Related papers

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models [7.61977883644433]
We propose PRRC to evaluate data quality across Professionalism, Readability, Reasoning, and Cleanliness. We introduce Meta-rater, a multi-dimensional data selection method that integrates these dimensions with existing quality metrics through learned optimal weightings. Experiments demonstrate that Meta-rater doubles convergence speed for 1.3B parameter models and improves downstream task performance by 3.23, with scalable benefits observed in 3.3B models trained on 100B tokens.
arXiv Detail & Related papers (2025-04-19T06:12:33Z)
PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection [28.442470930703337]
PRISM is a training-free approach for efficient multimodal data selection. It uses Pearson correlation analysis to quantify the intrinsic visual encoding properties of MLLMs. It reduces the overall time required for visual instruction tuning and data selection to just 30% of conventional methods.
arXiv Detail & Related papers (2025-02-17T18:43:41Z)
DELIFT: Data Efficient Language model Instruction Fine Tuning [13.538140114667772]
We introduce DELIFT, a novel algorithm that systematically optimize data selection across the three key stages of fine-tuning. Experiments across various tasks and model scales demonstrate that DELIFT can reduce the fine-tuning data size by up to 70% without compromising performance.
arXiv Detail & Related papers (2024-11-07T04:38:29Z)
Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective [4.548047308860141]
This study investigates the impact of different type of preference data on model performance. It aims to reduce their dependency on extensive amounts of preference data, which is expensive to collect.
arXiv Detail & Related papers (2024-10-22T00:11:41Z)
Data Selection via Optimal Control for Language Models [134.67665351539725]
This work investigates the selection of high-quality pre-training data from massive corpora to enhance LMs' capabilities for downstream usage. We introduce PMP-based Data Selection (PDS), a framework that approximates optimal data selection by solving the PMP conditions. The benefits of PDS extend to 400B models trained on 10T tokens, as evidenced by the extrapolation of the test loss curves according to the Scaling Laws.
arXiv Detail & Related papers (2024-10-09T17:06:57Z)
Offline Reinforcement Learning with Behavioral Supervisor Tuning [0.0]
We present TD3-BST, an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.
arXiv Detail & Related papers (2024-04-25T08:22:47Z)
LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z)
Guiding Language Model Reasoning with Planning Tokens [122.43639723387516]
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks. We propose a hierarchical generation scheme to encourage a more structural generation of chain-of-thought steps. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme.
arXiv Detail & Related papers (2023-10-09T13:29:37Z)
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning. It can overcome the negative transfer associated with synergistic learning and produce generalizable representations. It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones. We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z)
Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task. Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments. We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z)
On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.