Related papers: Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

URL: http://arxiv.org/abs/2202.03957v1
Date: Tue, 8 Feb 2022 16:09:02 GMT
Title: Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning
Authors: Stephen James, Pieter Abbeel
Abstract summary: We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Our proposed Bingham Policy parameterization (BPP) models the Bingham distribution and allows for better rotation prediction. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench.
Score: 95.00518278458908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that this paper encourages more research into developing other policy parameterization that are more suited for particular environments, rather than always assuming Gaussian.

Related papers

Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective. We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Subequivariant Graph Reinforcement Learning in 3D Environments [34.875774768800966]
We propose a novel setup for morphology-agnostic RL, dubbed Subequivariant Graph RL in 3D environments. Specifically, we first introduce a new set of more practical yet challenging benchmarks in 3D space. To optimize the policy over the enlarged state-action space, we propose to inject geometric symmetry.
arXiv Detail & Related papers (2023-05-30T11:34:57Z)
Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization [10.789649934346004]
We propose a sample-efficient method named cyclic policy distillation (CPD) CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each one. All of the learned local policies are distilled into a global policy for sim-to-real transfers.
arXiv Detail & Related papers (2022-07-29T09:22:53Z)
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents. We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS) DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z)
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces [23.186300629667134]
We study the convergence of policy gradient algorithms under heavy-tailed parameterizations. Our main theoretical contribution is the establishment that this scheme converges with constant step and batch sizes.
arXiv Detail & Related papers (2022-01-28T18:54:30Z)
Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution [0.0]
In this work, we investigate how this Beta policy performs when it is trained by the Proximal Policy Optimization algorithm on two continuous control tasks from OpenAI gym. For both tasks, the Beta policy is superior to the Gaussian policy in terms of agent's final expected reward, also showing more stability and faster convergence of the training process.
arXiv Detail & Related papers (2021-11-03T13:13:00Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
Gaussian Process Policy Optimization [0.0]
We propose a novel actor-critic, model-free reinforcement learning algorithm. It employs a Bayesian method of parameter space exploration to solve environments. It is shown to be comparable to and at times empirically outperform current algorithms on environments that simulate robotic locomotion.
arXiv Detail & Related papers (2020-03-02T18:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.