Measuring Data Quality for Dataset Selection in Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2111.13461v1
- Date: Fri, 26 Nov 2021 12:22:55 GMT
- Title: Measuring Data Quality for Dataset Selection in Offline Reinforcement
Learning
- Authors: Phillip Swazinna, Steffen Udluft, Thomas Runkler
- Abstract summary: Recently developed offline reinforcement learning algorithms have made it possible to learn policies directly from pre-collected datasets.
Since the performance the algorithms are able to deliver depends greatly on the dataset that is presented to them, practitioners need to pick the right dataset among the available ones.
We discuss ideas how to select promising datasets and propose three very simple indicators: Estimated relative return improvement (ERI), estimated action relativeity (EAS), and a combination of the two (COI)
- Score: 2.3333090554192615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently developed offline reinforcement learning algorithms have made it
possible to learn policies directly from pre-collected datasets, giving rise to
a new dilemma for practitioners: Since the performance the algorithms are able
to deliver depends greatly on the dataset that is presented to them,
practitioners need to pick the right dataset among the available ones. This
problem has so far not been discussed in the corresponding literature. We
discuss ideas how to select promising datasets and propose three very simple
indicators: Estimated relative return improvement (ERI) and estimated action
stochasticity (EAS), as well as a combination of the two (COI), and empirically
show that despite their simplicity they can be very effectively used for
dataset selection.
Related papers
- Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning [3.623224034411137]
offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems.
Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results.
We show how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets.
arXiv Detail & Related papers (2024-09-18T14:13:24Z) - Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison [9.324894567200582]
We systematically study preference datasets through three perspectives: scale, label noise, and information content.
Our work is a first step towards a data-centric approach to alignment by providing perspectives that aid in training efficiency and iterative data collection for RLHF.
arXiv Detail & Related papers (2024-09-15T03:55:03Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.