Related papers: Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies

Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies

URL: http://arxiv.org/abs/2305.10411v1
Date: Wed, 17 May 2023 17:48:24 GMT
Title: Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies
Authors: Hanna Ziesche and Leonel Rozo
Abstract summary: Policy optimization is the emphde facto paradigm to adapt robot policies as a function of task-specific objectives. We propose to leverage the structure of probabilistic policies by casting the policy optimization as an optimal transport problem. We evaluate our approach on common robotic settings: reaching motions, collision-avoidance behaviors, and multi-goal tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robots often rely on a repertoire of previously-learned motion policies for performing tasks of diverse complexities. When facing unseen task conditions or when new task requirements arise, robots must adapt their motion policies accordingly. In this context, policy optimization is the \emph{de facto} paradigm to adapt robot policies as a function of task-specific objectives. Most commonly-used motion policies carry particular structures that are often overlooked in policy optimization algorithms. We instead propose to leverage the structure of probabilistic policies by casting the policy optimization as an optimal transport problem. Specifically, we focus on robot motion policies that build on Gaussian mixture models (GMMs) and formulate the policy optimization as a Wassertein gradient flow over the GMMs space. This naturally allows us to constrain the policy updates via the $L^2$-Wasserstein distance between GMMs to enhance the stability of the policy optimization process. Furthermore, we leverage the geometry of the Bures-Wasserstein manifold to optimize the Gaussian distributions of the GMM policy via Riemannian optimization. We evaluate our approach on common robotic settings: Reaching motions, collision-avoidance behaviors, and multi-goal tasks. Our results show that our method outperforms common policy optimization baselines in terms of task success rate and low-variance solutions.

Related papers

Convergence of Policy Mirror Descent Beyond Compatible Function Approximation [66.4260157478436]
We develop theoretical PMD general policy classes where we strictly assume a weaker variational dominance and obtain convergence to the best-in-class policy. Our main notion leverages a novel notion induced by the local norm induced by the occupancy- gradient measure.
arXiv Detail & Related papers (2025-02-16T08:05:46Z)
Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice. We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z)
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods [0.40964539027092917]
Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time.
arXiv Detail & Related papers (2023-10-04T09:21:01Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change. We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z)
Policy Gradient Algorithms Implicitly Optimize by Continuation [7.351769270728942]
We argue that exploration in policy-gradient algorithms consists in a continuation of the return of the policy at hand, and that policies should be history-dependent rather than to maximize the return.
arXiv Detail & Related papers (2023-05-11T14:50:20Z)
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z)
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. In this paper, we propose our closed-form policy improvement operators. We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z)
Efficient Policy Iteration for Robust Markov Decision Processes via Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known. Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs. In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs) We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO) PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z)
Near Optimal Policy Optimization via REPS [33.992374484681704]
emphrelative entropy policy search (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains. There exist no guarantees on REPS's performance when using gradient-based solvers. We introduce a technique that uses emphgenerative access to the underlying decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
arXiv Detail & Related papers (2021-03-17T16:22:59Z)
PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)
Robust Policy Search for Robot Navigation with Stochastic Meta-Policies [5.7871177330714145]
In this work, we exploit the main ingredients of Bayesian optimization to provide robustness to different issues for policy search algorithms. We combine several methods and show how their interaction works better than the sum of the parts. We compare the proposed algorithm with previous results in several optimization benchmarks and robot tasks, such as pushing objects with a robot arm, or path finding with a rover.
arXiv Detail & Related papers (2020-03-02T16:30:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.