Measuring Data Quality for Dataset Selection in Offline Reinforcement
  Learning
        - URL: http://arxiv.org/abs/2111.13461v1
 - Date: Fri, 26 Nov 2021 12:22:55 GMT
 - Title: Measuring Data Quality for Dataset Selection in Offline Reinforcement
  Learning
 - Authors: Phillip Swazinna, Steffen Udluft, Thomas Runkler
 - Abstract summary: Recently developed offline reinforcement learning algorithms have made it possible to learn policies directly from pre-collected datasets.
Since the performance the algorithms are able to deliver depends greatly on the dataset that is presented to them, practitioners need to pick the right dataset among the available ones.
We discuss ideas how to select promising datasets and propose three very simple indicators: Estimated relative return improvement (ERI), estimated action relativeity (EAS), and a combination of the two (COI)
 - Score: 2.3333090554192615
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Recently developed offline reinforcement learning algorithms have made it
possible to learn policies directly from pre-collected datasets, giving rise to
a new dilemma for practitioners: Since the performance the algorithms are able
to deliver depends greatly on the dataset that is presented to them,
practitioners need to pick the right dataset among the available ones. This
problem has so far not been discussed in the corresponding literature. We
discuss ideas how to select promising datasets and propose three very simple
indicators: Estimated relative return improvement (ERI) and estimated action
stochasticity (EAS), as well as a combination of the two (COI), and empirically
show that despite their simplicity they can be very effectively used for
dataset selection.
 
       
      
        Related papers
        - RL-Selector: Reinforcement Learning-Guided Data Selection via Redundancy   Assessment [10.284993431741377]
We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships.<n>We reformulate data selection as a reinforcement learning process and propose RL-Selector.<n>Our method consistently outperforms existing state-of-the-art baselines.
arXiv  Detail & Related papers  (2025-06-26T06:28:56Z) - Unifying and Optimizing Data Values for Selection via   Sequential-Decision-Making [5.755427480127593]
We show that data values applied for selection can be reformulated as a sequential-decision-making problem.
We propose an efficient approximation scheme using learned bipartite graphs as surrogate utility models.
arXiv  Detail & Related papers  (2025-02-06T23:03:10Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv  Detail & Related papers  (2024-12-12T18:28:55Z) - Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [56.24431208419858]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv  Detail & Related papers  (2024-10-10T16:01:51Z) - Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning [3.623224034411137]
offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems.
Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results.
We show how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets.
arXiv  Detail & Related papers  (2024-09-18T14:13:24Z) - Towards Data-Centric RLHF: Simple Metrics for Preference Dataset   Comparison [9.324894567200582]
We systematically study preference datasets through three perspectives: scale, label noise, and information content.
Our work is a first step towards a data-centric approach to alignment by providing perspectives that aid in training efficiency and iterative data collection for RLHF.
arXiv  Detail & Related papers  (2024-09-15T03:55:03Z) - Take the essence and discard the dross: A Rethinking on Data Selection   for Fine-Tuning Large Language Models [36.22392593103493]
Data selection for fine-tuning large language models (LLMs) aims to choose a high-quality subset from existing datasets.
Existing surveys overlook an in-depth exploration of the fine-tuning phase.
We introduce a novel three-stage scheme - comprising feature extraction, criteria design, and selector evaluation - to systematically categorize and evaluate these methods.
arXiv  Detail & Related papers  (2024-06-20T08:58:58Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv  Detail & Related papers  (2024-02-06T19:18:04Z) - One-Shot Learning as Instruction Data Prospector for Large Language   Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv  Detail & Related papers  (2023-12-16T03:33:12Z) - Exploring Data Redundancy in Real-world Image Classification through
  Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv  Detail & Related papers  (2023-06-25T03:31:05Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv  Detail & Related papers  (2021-06-02T11:39:25Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
  Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv  Detail & Related papers  (2020-08-18T11:44:10Z) - Improving Multi-Turn Response Selection Models with Complementary
  Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv  Detail & Related papers  (2020-02-18T06:29:01Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.