Related papers: Goal Discovery with Causal Capacity for Efficient Reinforcement Learning

Goal Discovery with Causal Capacity for Efficient Reinforcement Learning

URL: http://arxiv.org/abs/2508.09624v1
Date: Wed, 13 Aug 2025 08:54:56 GMT
Title: Goal Discovery with Causal Capacity for Efficient Reinforcement Learning
Authors: Yan Yu, Yaodong Yang, Zhengbo Lu, Chengdong Ma, Wengang Zhou, Houqiang Li,
Abstract summary: Causal inference is crucial for humans to explore the world.<n>We propose a novel Goal Discovery with Causal Capacity framework for efficient environment exploration.
Score: 85.28685202281918
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Causal inference is crucial for humans to explore the world, which can be modeled to enable an agent to efficiently explore the environment in reinforcement learning. Existing research indicates that establishing the causality between action and state transition will enhance an agent to reason how a policy affects its future trajectory, thereby promoting directed exploration. However, it is challenging to measure the causality due to its intractability in the vast state-action space of complex scenarios. In this paper, we propose a novel Goal Discovery with Causal Capacity (GDCC) framework for efficient environment exploration. Specifically, we first derive a measurement of causality in state space, \emph{i.e.,} causal capacity, which represents the highest influence of an agent's behavior on future trajectories. After that, we present a Monte Carlo based method to identify critical points in discrete state space and further optimize this method for continuous high-dimensional environments. Those critical points are used to uncover where the agent makes important decisions in the environment, which are then regarded as our subgoals to guide the agent to make exploration more purposefully and efficiently. Empirical results from multi-objective tasks demonstrate that states with high causal capacity align with our expected subgoals, and our GDCC achieves significant success rate improvements compared to baselines.

Related papers

IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation [56.43007596544299]
IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
arXiv Detail & Related papers (2025-11-21T16:48:49Z)
Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges [6.872096639211664]
Causal Curiosity aims to estimate as accurately and efficiently as possible, without directly measuring them.<n>We present for the first time a measurement accuracy analysis of the future potentials and current limitations of this technique.<n>As a result of our work, we promote proposals for an improved and efficient design of Causal Curiosity methods.
arXiv Detail & Related papers (2025-05-13T11:30:51Z)
Can Large Language Models Help Experimental Design for Causal Discovery? [94.66802142727883]
Large Language Model Guided Intervention Targeting (LeGIT) is a robust framework that effectively incorporates LLMs to augment existing numerical approaches for the intervention targeting in causal discovery.<n>LeGIT demonstrates significant improvements and robustness over existing methods and even surpasses humans.
arXiv Detail & Related papers (2025-03-03T03:43:05Z)
Causal Information Prioritization for Efficient Reinforcement Learning [21.74375718642216]
Current Reinforcement Learning (RL) methods often suffer from sample-inefficiency.<n>Recent causal approaches aim to address this problem, but they lack grounded modeling of reward-guided causal understanding of states and actions.<n>We propose a novel method named Causal Information Prioritization (CIP) that improves sample efficiency by leveraging factored MDPs.
arXiv Detail & Related papers (2025-02-14T11:44:17Z)
Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL) Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z)
Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning [0.0]
Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions. We show that adaptive methods better approximate the trade-off between exploration and exploitation.
arXiv Detail & Related papers (2023-10-12T13:45:33Z)
Landmark Guided Active Exploration with State-specific Balance Coefficient [4.539657469634845]
We design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function. We propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty.
arXiv Detail & Related papers (2023-06-30T08:54:47Z)
Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z)
Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation [11.868792440783055]
We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge.
arXiv Detail & Related papers (2021-07-13T12:01:05Z)
Understanding the origin of information-seeking exploration in probabilistic objectives for control [62.997667081978825]
An exploration-exploitation trade-off is central to the description of adaptive behaviour. One approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' We show that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives.
arXiv Detail & Related papers (2021-03-11T18:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.