Related papers: Model-based Reinforcement Learning for Parameterized Action Spaces

Model-based Reinforcement Learning for Parameterized Action Spaces

URL: http://arxiv.org/abs/2404.03037v3
Date: Fri, 24 May 2024 02:15:42 GMT
Title: Model-based Reinforcement Learning for Parameterized Action Spaces
Authors: Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris,
Abstract summary: We propose a novel model-based reinforcement learning algorithm for PAMDPs. The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and performance than state-of-the-art PAMDP methods.
Score: 11.94388805327713
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel model-based reinforcement learning algorithm -- Dynamics Learning and predictive control with Parameterized Actions (DLPA) -- for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generated trajectory and the optimal trajectory during planning in terms of the value they achieved through the lens of Lipschitz Continuity. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods.

Related papers

Unifying Model Predictive Path Integral Control, Reinforcement Learning, and Diffusion Models for Optimal Control and Planning [6.871390204787483]
We establish a unified perspective that connects MPPI, RL, and Diffusion Models through gradient-based optimization on the Gibbs measure. We first show that MPPI can be interpreted as performing gradient ascent on a smoothed energy function. We then demonstrate that Policy Gradient methods reduce to MPPI by applying an exponential transformation to the objective function.
arXiv Detail & Related papers (2025-02-27T19:26:36Z)
Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes [8.588866536242145]
We propose a model-based (MBRL) algorithm, FLEXplore, to enhance the learning efficiency and performance of the agent. We show that FLEXplore has outstanding learning efficiency and performance compared to other baselines.
arXiv Detail & Related papers (2025-01-06T05:33:09Z)
Fuzzy Model Identification and Self Learning with Smooth Compositions [1.9573380763700716]
This paper develops a smooth model identification and self-learning strategy for dynamic systems. We have tried to solve the problem such that the model follows the changes and variations in the system on a continuous and smooth surface.
arXiv Detail & Related papers (2024-12-31T20:19:02Z)
Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching. We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy. Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z)
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning [19.84386060857712]
This paper introduces DiffTORI, which utilizes Differentiable Trajectory optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning. Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.
arXiv Detail & Related papers (2024-02-08T05:26:40Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Planning with Diffusion for Flexible Behavior Synthesis [125.24438991142573]
We consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories.
arXiv Detail & Related papers (2022-05-20T07:02:03Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator. We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
Automatic Differentiation and Continuous Sensitivity Analysis of Rigid Body Dynamics [15.565726546970678]
We introduce a differentiable physics simulator for rigid body dynamics. In the context of trajectory optimization, we introduce a closed-loop model-predictive control algorithm.
arXiv Detail & Related papers (2020-01-22T03:54:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.