Dataset Clustering for Improved Offline Policy Learning
- URL: http://arxiv.org/abs/2402.09550v1
- Date: Wed, 14 Feb 2024 20:01:41 GMT
- Title: Dataset Clustering for Improved Offline Policy Learning
- Authors: Qiang Wang, Yixin Deng, Francisco Roldan Sanchez, Keru Wang, Kevin
McGuinness, Noel O'Connor, and Stephen J. Redmond
- Abstract summary: offline policy learning aims to discover decision-making policies from previously-collected datasets without additional online interactions with the environment.
This paper studies a dataset characteristic that we refer to as multi-behavior, indicating that the dataset is collected using multiple policies that exhibit distinct behaviors.
We propose a behavior-aware deep clustering approach that partitions multi-behavior datasets into several uni-behavior subsets.
- Score: 7.873623003095065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline policy learning aims to discover decision-making policies from
previously-collected datasets without additional online interactions with the
environment. As the training dataset is fixed, its quality becomes a crucial
determining factor in the performance of the learned policy. This paper studies
a dataset characteristic that we refer to as multi-behavior, indicating that
the dataset is collected using multiple policies that exhibit distinct
behaviors. In contrast, a uni-behavior dataset would be collected solely using
one policy. We observed that policies learned from a uni-behavior dataset
typically outperform those learned from multi-behavior datasets, despite the
uni-behavior dataset having fewer examples and less diversity. Therefore, we
propose a behavior-aware deep clustering approach that partitions
multi-behavior datasets into several uni-behavior subsets, thereby benefiting
downstream policy learning. Our approach is flexible and effective; it can
adaptively estimate the number of clusters while demonstrating high clustering
accuracy, achieving an average Adjusted Rand Index of 0.987 across various
continuous control task datasets. Finally, we present improved policy learning
examples using dataset clustering and discuss several potential scenarios where
our approach might benefit the offline policy learning community.
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning [116.87367592920171]
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset.
In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks.
We propose an uncertainty-based Multi-Task Data Sharing (MTDS) approach that shares the entire dataset without data selection.
arXiv Detail & Related papers (2024-04-30T08:16:52Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Understanding the Effects of Dataset Characteristics on Offline
Reinforcement Learning [4.819336169151637]
Offline Reinforcement Learning can learn policies from a given dataset without interacting with the environment.
We show how dataset characteristics influence the performance of Offline RL algorithms for discrete action environments.
For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.
arXiv Detail & Related papers (2021-11-08T18:48:43Z) - Efficient Self-Supervised Data Collection for Offline Robot Learning [17.461103383630853]
A practical approach to robot reinforcement learning is to first collect a large batch of real or simulated robot interaction data.
We develop a simple-yet-effective goal-conditioned reinforcement-learning method that actively focuses data collection on novel observations.
arXiv Detail & Related papers (2021-05-10T18:42:58Z) - Policy Learning with Adaptively Collected Data [22.839095992238537]
We address the challenge of learning the optimal policy with adaptively collected data.
We propose an algorithm based on generalized augmented inverse propensity weighted estimators.
We demonstrate our algorithm's effectiveness using both synthetic data and public benchmark datasets.
arXiv Detail & Related papers (2021-05-05T22:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.