Episodic Novelty Through Temporal Distance
- URL: http://arxiv.org/abs/2501.15418v1
- Date: Sun, 26 Jan 2025 06:43:45 GMT
- Title: Episodic Novelty Through Temporal Distance
- Authors: Yuhua Jiang, Qihan Liu, Yiqin Yang, Xiaoteng Ma, Dianyu Zhong, Hao Hu, Jun Yang, Bin Liang, Bo Xu, Chongjie Zhang, Qianchuan Zhao,
- Abstract summary: Episodic Novelty Through Temporal Distance (ETD) is a novel approach that introduces temporal distance as a robust metric for state similarity and intrinsic reward.
By employing contrastive learning, ETD accurately estimates temporal distances and derives intrinsic rewards based on the novelty of states within the current episode.
- Score: 39.66260812278513
- License:
- Abstract: Exploration in sparse reward environments remains a significant challenge in reinforcement learning, particularly in Contextual Markov Decision Processes (CMDPs), where environments differ across episodes. Existing episodic intrinsic motivation methods for CMDPs primarily rely on count-based approaches, which are ineffective in large state spaces, or on similarity-based methods that lack appropriate metrics for state comparison. To address these shortcomings, we propose Episodic Novelty Through Temporal Distance (ETD), a novel approach that introduces temporal distance as a robust metric for state similarity and intrinsic reward computation. By employing contrastive learning, ETD accurately estimates temporal distances and derives intrinsic rewards based on the novelty of states within the current episode. Extensive experiments on various benchmark tasks demonstrate that ETD significantly outperforms state-of-the-art methods, highlighting its effectiveness in enhancing exploration in sparse reward CMDPs.
Related papers
- A Temporally Correlated Latent Exploration for Reinforcement Learning [4.1101087490516575]
Temporally Correlated Latent Exploration (TeCLE) is a novel intrinsic reward formulation that employs an action-conditioned latent space and temporal correlation.
We find that the injected temporal correlation determines the exploratory behaviors of agents.
We prove that the proposed TeCLE can be robust to the Noisy TV andity in benchmark environments.
arXiv Detail & Related papers (2024-12-06T04:38:43Z) - Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - Episodic Reinforcement Learning with Expanded State-reward Space [1.479675621064679]
We introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information.
Our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss.
arXiv Detail & Related papers (2024-01-19T06:14:36Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Hierarchical Compositional Representations for Few-shot Action
Recognition [51.288829293306335]
We propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition.
We divide a complicated action into several sub-actions by carefully designed hierarchical clustering.
We also adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations.
arXiv Detail & Related papers (2022-08-19T16:16:59Z) - Spatio-temporal Gait Feature with Adaptive Distance Alignment [90.5842782685509]
We try to increase the difference of gait features of different subjects from two aspects: the optimization of network structure and the refinement of extracted gait features.
Our method is proposed, it consists of Spatio-temporal Feature Extraction (SFE) and Adaptive Distance Alignment (ADA)
ADA uses a large number of unlabeled gait data in real life as a benchmark to refine the extracted-temporal features to make them have low inter-class similarity and high intra-class similarity.
arXiv Detail & Related papers (2022-03-07T13:34:00Z) - MICo: Learning improved representations via sampling-based state
similarity for Markov decision processes [18.829939056796313]
We present a new behavioural distance over the state space of a Markov decision process.
We demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents.
arXiv Detail & Related papers (2021-06-03T14:24:12Z) - Cross-domain Imitation from Observations [50.669343548588294]
Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior.
In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP.
We present a novel framework to learn correspondences across such domains.
arXiv Detail & Related papers (2021-05-20T21:08:25Z) - Enforcing Almost-Sure Reachability in POMDPs [10.883864654718103]
Partially-Observable Markov Decision Processes (POMDPs) are a well-known model for sequential decision making under limited information.
We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state.
We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative.
arXiv Detail & Related papers (2020-06-30T19:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.