Provably Efficient Offline Reinforcement Learning with Perturbed Data
Sources
- URL: http://arxiv.org/abs/2306.08364v1
- Date: Wed, 14 Jun 2023 08:53:20 GMT
- Title: Provably Efficient Offline Reinforcement Learning with Perturbed Data
Sources
- Authors: Chengshuai Shi, Wei Xiong, Cong Shen, Jing Yang
- Abstract summary: Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task.
In practice, however, data often come from several heterogeneous but related sources.
This work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself.
- Score: 23.000116974718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing theoretical studies on offline reinforcement learning (RL) mostly
consider a dataset sampled directly from the target task. In practice, however,
data often come from several heterogeneous but related sources. Motivated by
this gap, this work aims at rigorously understanding offline RL with multiple
datasets that are collected from randomly perturbed versions of the target task
instead of from itself. An information-theoretic lower bound is derived, which
reveals a necessary requirement on the number of involved sources in addition
to that on the number of data samples. Then, a novel HetPEVI algorithm is
proposed, which simultaneously considers the sample uncertainties from a finite
number of data samples per data source and the source uncertainties due to a
finite number of available data sources. Theoretical analyses demonstrate that
HetPEVI can solve the target task as long as the data sources collectively
provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal
up to a polynomial factor of the horizon length. Finally, the study is extended
to offline Markov games and offline robust RL, which demonstrates the
generality of the proposed designs and theoretical analyses.
Related papers
- Domain Adaptation for Offline Reinforcement Learning with Limited Samples [2.3674123304219816]
offline reinforcement learning learns effective policies from a static target dataset.
Despite state-of-the-art (SOTA) offline RL algorithms being promising, they highly rely on the quality of the target dataset.
This paper proposes the first framework that theoretically and experimentally explores how the weight assigned to each dataset affects the performance of offline RL.
arXiv Detail & Related papers (2024-08-22T05:38:48Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Sparse outlier-robust PCA for multi-source data [2.3226893628361687]
We introduce a novel PCA methodology that simultaneously selects important features as well as local source-specific patterns.
We develop a regularization problem with a penalty that accommodates global-local structured sparsity patterns.
We provide an efficient implementation of our proposal via the Alternating Direction Method of Multiplier.
arXiv Detail & Related papers (2024-07-23T08:55:03Z) - Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning [116.87367592920171]
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset.
In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks.
We propose an uncertainty-based Multi-Task Data Sharing (MTDS) approach that shares the entire dataset without data selection.
arXiv Detail & Related papers (2024-04-30T08:16:52Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Quality Not Quantity: On the Interaction between Dataset Design and
Robustness of CLIP [43.7219097444333]
We introduce a testbed of six publicly available data sources to investigate how pre-training distributions induce robustness in CLIP.
We find that the performance of the pre-training data varies substantially across distribution shifts.
We find that combining multiple sources does not necessarily yield better models, but rather dilutes the robustness of the best individual data source.
arXiv Detail & Related papers (2022-08-10T18:24:23Z) - Source data selection for out-of-domain generalization [0.76146285961466]
Poor selection of a source dataset can lead to poor performance on the target.
We propose two source selection methods that are based on the multi-bandit theory and random search.
Our proposals can be viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.
arXiv Detail & Related papers (2022-02-04T14:37:31Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.