Related papers: Efficient Reinforcement Learning Through Trajectory Generation

Efficient Reinforcement Learning Through Trajectory Generation

URL: http://arxiv.org/abs/2211.17249v2
Date: Thu, 1 Dec 2022 18:55:45 GMT
Title: Efficient Reinforcement Learning Through Trajectory Generation
Authors: Wenqi Cui, Linbin Huang, Weiwei Yang, Baosen Zhang
Abstract summary: A key barrier to using reinforcement learning in real-world applications is the requirement of a large number of system interactions to learn a good control policy. Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data. We propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies.
Score: 5.766441610380447
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key barrier to using reinforcement learning (RL) in many real-world applications is the requirement of a large number of system interactions to learn a good control policy. Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data. However, their performances suffer from the lack of exploration and the distributional shifts in trajectories once controllers are updated. Moreover, most RL methods require that all states are directly observed, which is difficult to be attained in many settings. To overcome these challenges, we propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies. Motivated by the fundamental lemma for linear systems, assuming sufficient excitation, we generate trajectories from linear combinations of historical trajectories. For linear feedback control, we prove that the algorithm generates trajectories with the exact distribution as if they are sampled from the real system using the updated control policy. In particular, the algorithm extends to systems where the states are not directly observed. Experiments show that the proposed method significantly reduces the number of sampled data needed for RL algorithms.

Related papers

Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches [4.364595470673757]
Portfolio Beam Search (PBS) is a simple-yet-effective alternative to Beam Search (BS) We develop an uncertainty-aware diversification mechanism, which we integrate into a sequential decoding algorithm at inference time. We empirically demonstrate the effectiveness of PBS on the D4RL benchmark, where it achieves higher returns and significantly reduces outcome variability.
arXiv Detail & Related papers (2025-02-13T15:51:46Z)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z)
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data. One of the main challenges in offline RL is the distribution shift. We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z)
Model-based adaptation for sample efficient transfer in reinforcement learning control of parameter-varying systems [1.8799681615947088]
We leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning algorithms. We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone.
arXiv Detail & Related papers (2023-05-20T10:11:09Z)
In-Distribution Barrier Functions: Self-Supervised Policy Filters that Avoid Out-of-Distribution States [84.24300005271185]
We propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations. Our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings.
arXiv Detail & Related papers (2023-01-27T22:28:19Z)
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data [28.846826115837825]
offline reinforcement learning can be used to improve future performance by leveraging historical data. We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy. We show it can have substantial impacts when the dataset is small.
arXiv Detail & Related papers (2022-10-16T21:24:53Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System [0.0]
We propose an online non-episodic algorithm that gains knowledge about the system from a single trajectory. We characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation.
arXiv Detail & Related papers (2021-03-24T15:51:28Z)
Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms. We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.