Critic Sequential Monte Carlo
- URL: http://arxiv.org/abs/2205.15460v1
- Date: Mon, 30 May 2022 23:14:24 GMT
- Title: Critic Sequential Monte Carlo
- Authors: Vasileios Lioutas, Jonathan Wilder Lavington, Justice Sefas, Matthew
Niedoba, Yunpeng Liu, Berend Zwartsenberg, Setareh Dabiri, Frank Wood, Adam
Scibior
- Abstract summary: CriticSMC is a new algorithm for planning as inference built from a novel composition of sequential Monte Carlo with soft-Q function factors.
Our experiments on self-driving car collision avoidance in simulation demonstrate improvements against baselines in terms of minimization of infraction relative to computational effort.
- Score: 15.596665321375298
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce CriticSMC, a new algorithm for planning as inference built from
a novel composition of sequential Monte Carlo with learned soft-Q function
heuristic factors. This algorithm is structured so as to allow using large
numbers of putative particles leading to efficient utilization of computational
resource and effective discovery of high reward trajectories even in
environments with difficult reward surfaces such as those arising from hard
constraints. Relative to prior art our approach is notably still compatible
with model-free reinforcement learning in the sense that the implicit policy we
produce can be used at test time in the absence of a world model. Our
experiments on self-driving car collision avoidance in simulation demonstrate
improvements against baselines in terms of infraction minimization relative to
computational effort while maintaining diversity and realism of found
trajectories.
Related papers
- Optimal Transportation by Orthogonal Coupling Dynamics [0.0]
We propose a novel framework to address the Monge-Kantorovich problem based on a projection type gradient descent scheme.
The micro-dynamics is built on the notion of the conditional expectation, where the connection with the opinion dynamics is explored.
We demonstrate that the devised dynamics recovers random maps with favourable computational performance.
arXiv Detail & Related papers (2024-10-10T15:53:48Z) - SPO: Sequential Monte Carlo Policy Optimisation [41.52684912140086]
We introduce SPO: Sequential Monte Carlo Policy optimisation.
We show that SPO provides robust policy improvement and efficient scaling properties.
We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines.
arXiv Detail & Related papers (2024-02-12T10:32:47Z) - FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems [6.612035830987298]
We introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design.
Our policy maximizes the information of the next step and results in an adaptive exploration algorithm.
The performance achieved by FLEX is competitive and its computational cost is low.
arXiv Detail & Related papers (2023-04-26T10:20:55Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Congestion-aware Multi-agent Trajectory Prediction for Collision
Avoidance [110.63037190641414]
We propose to learn congestion patterns explicitly and devise a novel "Sense--Learn--Reason--Predict" framework.
By decomposing the learning phases into two stages, a "student" can learn contextual cues from a "teacher" while generating collision-free trajectories.
In experiments, we demonstrate that the proposed model is able to generate collision-free trajectory predictions in a synthetic dataset.
arXiv Detail & Related papers (2021-03-26T02:42:33Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Stochastic Gradient Langevin Dynamics Algorithms with Adaptive Drifts [8.36840154574354]
We propose a class of adaptive gradient Markov chain Monte Carlo (SGMCMC) algorithms, where the drift function is biased to enhance escape from saddle points and the bias is adaptively adjusted according to the gradient of past samples.
We demonstrate via numerical examples that the proposed algorithms can significantly outperform the existing SGMCMC algorithms.
arXiv Detail & Related papers (2020-09-20T22:03:39Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.