Dataset Distillation for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2407.20299v3
- Date: Mon, 03 Nov 2025 01:38:40 GMT
- Title: Dataset Distillation for Offline Reinforcement Learning
- Authors: Jonathan Light, Yuanzhe Liu, Ziniu Hu,
- Abstract summary: offline reinforcement learning often requires a quality dataset that we can train a policy on.<n>We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model.<n>Our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset.
- Score: 18.286824220933934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at https://datasetdistillation4rl.github.io . We also provide our implementation at https://github.com/ggflow123/DDRL .
Related papers
- Extracting alignment data in open models [50.81383232591576]
We show that it is possible to extract significant amounts of alignment training data from a post-trained model.<n>This data is useful to steer the model to improve certain capabilities such as long-context reasoning, safety, instruction following, and maths.<n>We find that models readily regurgitate training data that was used in post-training phases such as SFT or RL.
arXiv Detail & Related papers (2025-10-21T12:06:00Z) - Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps [47.57615889991631]
offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset.<n>We propose an approach that utilizes the Wasserstein distance, which is robust to out-of-distribution data.<n>Our approach demonstrates comparable or superior performance to widely used methods on the D4RL benchmark dataset.
arXiv Detail & Related papers (2025-07-14T22:28:36Z) - Info-Coevolution: An Efficient Framework for Data Model Coevolution [11.754869657967207]
We propose a novel framework that enables models and data to coevolve through online selective annotation with no bias.<n>For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32% without performance loss.
arXiv Detail & Related papers (2025-06-09T17:04:11Z) - DataRater: Meta-Learned Dataset Curation [40.90328309013541]
We propose emphDataRater, which estimates the value of training on any particular data point.<n>It is done by meta-learning using meta-gradients', with the objective of improving training efficiency on held out data.<n>In extensive experiments across a range of model scales and datasets, we find that using our DataRater to filter data is highly effective.
arXiv Detail & Related papers (2025-05-23T13:43:14Z) - Transferable text data distillation by trajectory matching [27.826518926355295]
The data distillation method aims to synthesize a small number of data samples to achieve the training effect of the full data set.<n>In this work, we proposed a method that involves learning pseudo prompt data based on trajectory matching.<n> Evaluations on two benchmarks, including ARC-Easy and MMLU instruction tuning datasets, established the superiority of our distillation approach over the SOTA data selection method LESS.
arXiv Detail & Related papers (2025-04-14T02:39:26Z) - What Makes a Good Dataset for Knowledge Distillation? [8.594140167290098]
Knowledge distillation (KD) has been a popular and effective method for model compression.<n>One important assumption of KD is that the teacher's original dataset will also be available when training the student.<n>In situations such as continual learning and distilling large models trained on company-withheld datasets, having access to the original data may not always be possible.
arXiv Detail & Related papers (2024-11-19T19:10:12Z) - Real-World Offline Reinforcement Learning from Vision Language Model Feedback [19.494335952082466]
offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions.
Most existing offline RL works assume the dataset is already labeled with the task rewards.
We propose a novel system that automatically generates reward labels for offline datasets.
arXiv Detail & Related papers (2024-11-08T02:12:34Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Cross-Modal Adapter for Text-Video Retrieval [91.9575196703281]
We present a novel $textbfCross-Modal Adapter$ for parameter-efficient fine-tuning.
Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers.
It achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets.
arXiv Detail & Related papers (2022-11-17T16:15:30Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Offline RL With Resource Constrained Online Deployment [13.61540280864938]
offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible.
This work introduces and formalizes a novel resource-constrained problem setting.
We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features.
arXiv Detail & Related papers (2021-10-07T03:43:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.