Efficient Reinforcement Learning Through Trajectory Generation
- URL: http://arxiv.org/abs/2211.17249v2
- Date: Thu, 1 Dec 2022 18:55:45 GMT
- Title: Efficient Reinforcement Learning Through Trajectory Generation
- Authors: Wenqi Cui, Linbin Huang, Weiwei Yang, Baosen Zhang
- Abstract summary: A key barrier to using reinforcement learning in real-world applications is the requirement of a large number of system interactions to learn a good control policy.
Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data.
We propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies.
- Score: 5.766441610380447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key barrier to using reinforcement learning (RL) in many real-world
applications is the requirement of a large number of system interactions to
learn a good control policy. Off-policy and Offline RL methods have been
proposed to reduce the number of interactions with the physical environment by
learning control policies from historical data. However, their performances
suffer from the lack of exploration and the distributional shifts in
trajectories once controllers are updated. Moreover, most RL methods require
that all states are directly observed, which is difficult to be attained in
many settings.
To overcome these challenges, we propose a trajectory generation algorithm,
which adaptively generates new trajectories as if the system is being operated
and explored under the updated control policies. Motivated by the fundamental
lemma for linear systems, assuming sufficient excitation, we generate
trajectories from linear combinations of historical trajectories. For linear
feedback control, we prove that the algorithm generates trajectories with the
exact distribution as if they are sampled from the real system using the
updated control policy. In particular, the algorithm extends to systems where
the states are not directly observed. Experiments show that the proposed method
significantly reduces the number of sampled data needed for RL algorithms.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Model-based adaptation for sample efficient transfer in reinforcement
learning control of parameter-varying systems [1.8799681615947088]
We leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning algorithms.
We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone.
arXiv Detail & Related papers (2023-05-20T10:11:09Z) - In-Distribution Barrier Functions: Self-Supervised Policy Filters that
Avoid Out-of-Distribution States [84.24300005271185]
We propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations.
Our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings.
arXiv Detail & Related papers (2023-01-27T22:28:19Z) - Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data [28.846826115837825]
offline reinforcement learning can be used to improve future performance by leveraging historical data.
We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy.
We show it can have substantial impacts when the dataset is small.
arXiv Detail & Related papers (2022-10-16T21:24:53Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System [0.0]
We propose an online non-episodic algorithm that gains knowledge about the system from a single trajectory.
We characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation.
arXiv Detail & Related papers (2021-03-24T15:51:28Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.