A Temporally Correlated Latent Exploration for Reinforcement Learning
- URL: http://arxiv.org/abs/2412.04775v1
- Date: Fri, 06 Dec 2024 04:38:43 GMT
- Title: A Temporally Correlated Latent Exploration for Reinforcement Learning
- Authors: SuMin Oh, WanSoo Kim, HyunJin Kim,
- Abstract summary: Temporally Correlated Latent Exploration (TeCLE) is a novel intrinsic reward formulation that employs an action-conditioned latent space and temporal correlation.
We find that the injected temporal correlation determines the exploratory behaviors of agents.
We prove that the proposed TeCLE can be robust to the Noisy TV andity in benchmark environments.
- Score: 4.1101087490516575
- License:
- Abstract: Efficient exploration remains one of the longstanding problems of deep reinforcement learning. Instead of depending solely on extrinsic rewards from the environments, existing methods use intrinsic rewards to enhance exploration. However, we demonstrate that these methods are vulnerable to Noisy TV and stochasticity. To tackle this problem, we propose Temporally Correlated Latent Exploration (TeCLE), which is a novel intrinsic reward formulation that employs an action-conditioned latent space and temporal correlation. The action-conditioned latent space estimates the probability distribution of states, thereby avoiding the assignment of excessive intrinsic rewards to unpredictable states and effectively addressing both problems. Whereas previous works inject temporal correlation for action selection, the proposed method injects it for intrinsic reward computation. We find that the injected temporal correlation determines the exploratory behaviors of agents. Various experiments show that the environment where the agent performs well depends on the amount of temporal correlation. To the best of our knowledge, the proposed TeCLE is the first approach to consider the action conditioned latent space and temporal correlation for curiosity-driven exploration. We prove that the proposed TeCLE can be robust to the Noisy TV and stochasticity in benchmark environments, including Minigrid and Stochastic Atari.
Related papers
- Episodic Novelty Through Temporal Distance [39.66260812278513]
Episodic Novelty Through Temporal Distance (ETD) is a novel approach that introduces temporal distance as a robust metric for state similarity and intrinsic reward.
By employing contrastive learning, ETD accurately estimates temporal distances and derives intrinsic rewards based on the novelty of states within the current episode.
arXiv Detail & Related papers (2025-01-26T06:43:45Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - On the Identification of Temporally Causal Representation with Instantaneous Dependence [50.14432597910128]
Temporally causal representation learning aims to identify the latent causal process from time series observations.
Most methods require the assumption that the latent causal processes do not have instantaneous relations.
We propose an textbfIDentification framework for instantanetextbfOus textbfLatent dynamics.
arXiv Detail & Related papers (2024-05-24T08:08:05Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - DEIR: Efficient and Robust Exploration through
Discriminative-Model-Based Episodic Intrinsic Rewards [2.09711130126031]
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms.
Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations.
We propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term.
arXiv Detail & Related papers (2023-04-21T06:39:38Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Locally Persistent Exploration in Continuous Control Tasks with Sparse
Rewards [0.0]
We propose a new exploration method, based on two intuitions.
The choice of the next exploratory action should depend not only on the (Markovian) state of the environment, but also on the agent's trajectory.
We discuss the theoretical properties of locally self-avoiding walks, and their ability to provide a kind of short-term memory.
arXiv Detail & Related papers (2020-12-26T01:30:26Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.