Probabilistic Actor-Critic: Learning to Explore with PAC-Bayes
Uncertainty
- URL: http://arxiv.org/abs/2402.03055v1
- Date: Mon, 5 Feb 2024 14:42:45 GMT
- Title: Probabilistic Actor-Critic: Learning to Explore with PAC-Bayes
Uncertainty
- Authors: Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir
- Abstract summary: We introduce Probabilistic Actor-Critic (PAC), a novel reinforcement learning algorithm with improved continuous control.
PAC achieves this by integrating policies and critics, creating a dynamic synergy between the estimation of critic uncertainty and actor training.
- Score: 14.348879224354125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Probabilistic Actor-Critic (PAC), a novel reinforcement learning
algorithm with improved continuous control performance thanks to its ability to
mitigate the exploration-exploitation trade-off. PAC achieves this by
seamlessly integrating stochastic policies and critics, creating a dynamic
synergy between the estimation of critic uncertainty and actor training. The
key contribution of our PAC algorithm is that it explicitly models and infers
epistemic uncertainty in the critic through Probably Approximately
Correct-Bayesian (PAC-Bayes) analysis. This incorporation of critic uncertainty
enables PAC to adapt its exploration strategy as it learns, guiding the actor's
decision-making process. PAC compares favorably against fixed or pre-scheduled
exploration schemes of the prior art. The synergy between stochastic policies
and critics, guided by PAC-Bayes analysis, represents a fundamental step
towards a more adaptive and effective exploration strategy in deep
reinforcement learning. We report empirical evaluations demonstrating PAC's
enhanced stability and improved performance over the state of the art in
diverse continuous control problems.
Related papers
- Recursive Deep Inverse Reinforcement Learning [16.05411507856928]
Inferring an adversary's goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems.
We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals.
arXiv Detail & Related papers (2025-04-17T17:39:35Z) - Deterministic Exploration via Stationary Bellman Error Maximization [6.474106100512158]
Exploration is a crucial and distinctive aspect of reinforcement learning (RL)
In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy.
Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
arXiv Detail & Related papers (2024-10-31T11:46:48Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms [0.6961253535504979]
We present sufficient conditions that ensure convergence of the multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm.
It is an example of one of the most popular paradigms of Deep Reinforcement Learning (DeepRL) for tackling continuous action spaces: the actor-critic paradigm.
arXiv Detail & Related papers (2022-01-03T10:33:52Z) - Exploring More When It Needs in Deep Reinforcement Learning [3.442899929543427]
We propose a mechanism of policy in Deep Reinforcement Learning, which is exploring more when agent needs, called Add Noise to Noise (AN2N)
We use cumulative rewards to evaluate which past states the agents have not performed well, and use cosine distance to measure whether the current state needs to be explored more.
We apply it to the field of continuous control tasks, such as halfCheetah, Hopper, and Swimmer, achieving considerable improvement in performance and convergence speed.
arXiv Detail & Related papers (2021-09-28T04:29:38Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Adversarially Guided Actor-Critic [42.76141646708985]
This paper introduces a third protagonist: the adversary.
While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions.
Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration.
arXiv Detail & Related papers (2021-02-08T17:31:13Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network.
We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization.
This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network.
We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.