Policy Gradient Methods in the Presence of Symmetries and State
Abstractions
- URL: http://arxiv.org/abs/2305.05666v2
- Date: Thu, 7 Mar 2024 17:26:06 GMT
- Title: Policy Gradient Methods in the Presence of Symmetries and State
Abstractions
- Authors: Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger,
Doina Precup
- Abstract summary: Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization.
We study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces.
We propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously.
- Score: 46.66541516203923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) on high-dimensional and complex problems relies
on abstraction for improved efficiency and generalization. In this paper, we
study abstraction in the continuous-control setting, and extend the definition
of Markov decision process (MDP) homomorphisms to the setting of continuous
state and action spaces. We derive a policy gradient theorem on the abstract
MDP for both stochastic and deterministic policies. Our policy gradient results
allow for leveraging approximate symmetries of the environment for policy
optimization. Based on these theorems, we propose a family of actor-critic
algorithms that are able to learn the policy and the MDP homomorphism map
simultaneously, using the lax bisimulation metric. Finally, we introduce a
series of environments with continuous symmetries to further demonstrate the
ability of our algorithm for action abstraction in the presence of such
symmetries. We demonstrate the effectiveness of our method on our environments,
as well as on challenging visual control tasks from the DeepMind Control Suite.
Our method's ability to utilize MDP homomorphisms for representation learning
leads to improved performance, and the visualizations of the latent space
clearly demonstrate the structure of the learned abstraction.
Related papers
- Spatio-temporal Value Semantics-based Abstraction for Dense Deep Reinforcement Learning [1.4542411354617986]
Intelligent Cyber-Physical Systems (ICPS) represent a specialized form of Cyber-Physical System (CPS)
CNNs and Deep Reinforcement Learning (DRL) undertake multifaceted tasks encompassing perception, decision-making, and control.
DRL confronts challenges in terms of efficiency, generalization capabilities, and data scarcity during decision-making process.
We propose an innovative abstract modeling approach grounded in spatial-temporal value semantics.
arXiv Detail & Related papers (2024-05-24T02:21:10Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.