Model-Predictive Control via Cross-Entropy and Gradient-Based
Optimization
- URL: http://arxiv.org/abs/2004.08763v1
- Date: Sun, 19 Apr 2020 03:54:50 GMT
- Title: Model-Predictive Control via Cross-Entropy and Gradient-Based
Optimization
- Authors: Homanga Bharadhwaj, Kevin Xie, Florian Shkurti
- Abstract summary: Cross-Entropy Method (CEM) is a population-based optimization method for planning a sequence of actions.
We propose a method to solve this problem by interleaving CEM and gradient descent steps in optimizing the action sequence.
Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces.
- Score: 26.497575737219794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works in high-dimensional model-predictive control and model-based
reinforcement learning with learned dynamics and reward models have resorted to
population-based optimization methods, such as the Cross-Entropy Method (CEM),
for planning a sequence of actions. To decide on an action to take, CEM
conducts a search for the action sequence with the highest return according to
the dynamics model and reward. Action sequences are typically randomly sampled
from an unconditional Gaussian distribution and evaluated on the environment.
This distribution is iteratively updated towards action sequences with higher
returns. However, this planning method can be very inefficient, especially for
high-dimensional action spaces. An alternative line of approaches optimize
action sequences directly via gradient descent, but are prone to local optima.
We propose a method to solve this planning problem by interleaving CEM and
gradient descent steps in optimizing the action sequence. Our experiments show
faster convergence of the proposed hybrid approach, even for high-dimensional
action spaces, avoidance of local minima, and better or equal performance to
CEM. Code accompanying the paper is available here
https://github.com/homangab/gradcem.
Related papers
- Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation [60.41803046775034]
We show how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users.
Experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.
arXiv Detail & Related papers (2024-06-02T17:26:27Z) - Improving sample efficiency of high dimensional Bayesian optimization
with MCMC [7.241485121318798]
We propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior.
We show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.
arXiv Detail & Related papers (2024-01-05T05:56:42Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Planning with Sequence Models through Iterative Energy Minimization [22.594413287842574]
We suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization.
We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy.
We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments.
arXiv Detail & Related papers (2023-03-28T17:53:22Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with
Gaussian Processes [39.411957858548355]
We show how to achieve smoother model predictive factor control using online sequential inference.
We evaluate this approach on several robot control tasks, matching to sample prior methods while also ensuring smoothness.
arXiv Detail & Related papers (2022-10-07T12:56:31Z) - CEM-GD: Cross-Entropy Method with Gradient Descent Planner for
Model-Based Reinforcement Learning [41.233656743112185]
We propose a novel planner that combines first-order methods with Cross-Entropy Method (CEM)
We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples.
arXiv Detail & Related papers (2021-12-14T21:11:27Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.