CEM-GD: Cross-Entropy Method with Gradient Descent Planner for
Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2112.07746v1
- Date: Tue, 14 Dec 2021 21:11:27 GMT
- Title: CEM-GD: Cross-Entropy Method with Gradient Descent Planner for
Model-Based Reinforcement Learning
- Authors: Kevin Huang, Sahin Lale, Ugo Rosolia, Yuanyuan Shi, Anima Anandkumar
- Abstract summary: We propose a novel planner that combines first-order methods with Cross-Entropy Method (CEM)
We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples.
- Score: 41.233656743112185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current state-of-the-art model-based reinforcement learning algorithms use
trajectory sampling methods, such as the Cross-Entropy Method (CEM), for
planning in continuous control settings. These zeroth-order optimizers require
sampling a large number of trajectory rollouts to select an optimal action,
which scales poorly for large prediction horizons or high dimensional action
spaces. First-order methods that use the gradients of the rewards with respect
to the actions as an update can mitigate this issue, but suffer from local
optima due to the non-convex optimization landscape. To overcome these issues
and achieve the best of both worlds, we propose a novel planner, Cross-Entropy
Method with Gradient Descent (CEM-GD), that combines first-order methods with
CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant
amount of trajectory rollouts to explore the optimization landscape and avoid
poor local minima. It then uses the top trajectories as initialization for
gradient descent and applies gradient updates to each of these trajectories to
find the optimal action sequence. At each subsequent time step, however, CEM-GD
samples much fewer trajectories from CEM before applying gradient updates. We
show that as the dimensionality of the planning problem increases, CEM-GD
maintains desirable performance with a constant small number of samples by
using the gradient information, while avoiding local optima using initially
well-sampled trajectories. Furthermore, CEM-GD achieves better performance than
CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer
samples per time step, resulting in around 25% less computation time and 10%
less memory usage. The implementation of CEM-GD is available at
$\href{https://github.com/KevinHuang8/CEM-GD}{\text{https://github.com/KevinHuang8/CEM-GD}}$.
Related papers
- Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase.
Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative.
We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z) - Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for
Deep Learning [8.173034693197351]
We propose a new per-layer adaptive step-size procedure for first-order optimization methods in deep learning.
The proposed approach exploits the layer-wise curvature information contained in the diagonal blocks of the Hessian in deep neural networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer.
Numerical experiments show that SGD with momentum and AdamW combined with the proposed per-layer step-sizes are able to choose effective LR schedules.
arXiv Detail & Related papers (2023-05-23T04:12:55Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - AdaDGS: An adaptive black-box optimization method with a nonlocal
directional Gaussian smoothing gradient [3.1546318469750196]
A directional Gaussian smoothing (DGS) approach was recently proposed in (Zhang et al., 2020) and used to define a truly nonlocal gradient, referred to as the DGS gradient, for high-dimensional black-box optimization.
We present a simple, yet ingenious and efficient adaptive approach for optimization with the DGS gradient, which removes the need of hyper- parameter fine tuning.
arXiv Detail & Related papers (2020-11-03T21:20:25Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Model-Predictive Control via Cross-Entropy and Gradient-Based
Optimization [26.497575737219794]
Cross-Entropy Method (CEM) is a population-based optimization method for planning a sequence of actions.
We propose a method to solve this problem by interleaving CEM and gradient descent steps in optimizing the action sequence.
Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces.
arXiv Detail & Related papers (2020-04-19T03:54:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.