Related papers: Variable-Agnostic Causal Exploration for Reinforcement Learning

Variable-Agnostic Causal Exploration for Reinforcement Learning

URL: http://arxiv.org/abs/2407.12437v1
Date: Wed, 17 Jul 2024 09:45:27 GMT
Title: Variable-Agnostic Causal Exploration for Reinforcement Learning
Authors: Minh Hoang Nguyen, Hung Le, Svetha Venkatesh,
Abstract summary: We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL) Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
Score: 56.52768265734155
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal discovery, they often make unrealistic assumptions of causal variables in the environments. In this paper, we introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL), incorporating causal relationships to drive exploration in RL without specifying environmental causal variables. Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. Subsequently, it constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion. This can be leveraged to generate intrinsic rewards or establish a hierarchy of subgoals to enhance exploration efficiency. Experimental results showcase a significant improvement in agent performance in grid-world, 2d games and robotic domains, particularly in scenarios with sparse rewards and noisy actions, such as the notorious Noisy-TV environments.

Related papers

Consistency Is Not Always Correct: Towards Understanding the Role of Exploration in Post-Training Reasoning [75.79451512757844]
Foundation models exhibit broad knowledge but limited task-specific reasoning.<n> RLVR and inference scaling motivate post-training strategies such as RLVR and inference scaling.<n>We show that RLVR induces a squeezing effect, reducing reasoning entropy and forgetting some correct paths.
arXiv Detail & Related papers (2025-11-10T18:25:26Z)
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z)
DODO: Causal Structure Learning with Budgeted Interventions [1.0323063834827415]
We introduce DODO, an algorithm defining how an Agent can autonomously learn the causal structure of its environment.<n>Results show better performance for DODO, compared to observational approaches, in all but the most limited resource conditions.
arXiv Detail & Related papers (2025-10-09T13:32:33Z)
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection [55.125987985864896]
We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors.<n>We propose a simple yet effective approach to instantiate a search agent, RE-Searcher.<n>This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments.
arXiv Detail & Related papers (2025-09-30T10:25:27Z)
Goal Discovery with Causal Capacity for Efficient Reinforcement Learning [85.28685202281918]
Causal inference is crucial for humans to explore the world.<n>We propose a novel Goal Discovery with Causal Capacity framework for efficient environment exploration.
arXiv Detail & Related papers (2025-08-13T08:54:56Z)
Better Decisions through the Right Causal World Model [17.623937562865617]
Causal Object-centric Model Extraction Tool (COMET) is a novel algorithm designed to learn the exact interpretable causal world models (CWMs) Our results, validated in Atari environments such as Pong and Freeway, demonstrate the accuracy and robustness of COMET.
arXiv Detail & Related papers (2025-04-09T20:29:13Z)
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training [62.536191233049614]
Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs) This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld. We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse.
arXiv Detail & Related papers (2025-03-11T15:17:02Z)
Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z)
Causal Reinforcement Learning for Optimisation of Robot Dynamics in Unknown Environments [4.494898338391223]
This work introduces a novel Causal Reinforcement Learning approach to enhancing robotics operations. Our proposed machine learning architecture enables robots to learn the causal relationships between the visual characteristics of the objects.
arXiv Detail & Related papers (2024-09-20T11:40:51Z)
The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective [18.389232051345825]
In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap agents in local optima. This paper revisits the exploration-exploitation dilemma from the perspective of entropy. We establish an end-to-end adaptive framework called AdaZero, which automatically determines whether to explore or to exploit.
arXiv Detail & Related papers (2024-08-19T13:21:46Z)
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work. In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more. However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
Deep Intrinsically Motivated Exploration in Continuous Control [0.0]
In continuous systems, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. We adapt existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel directed exploration strategy. Our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.
arXiv Detail & Related papers (2022-10-01T14:52:16Z)
Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z)
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure. Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs. In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z)
Disentangling causal effects for hierarchical reinforcement learning [0.0]
This study aims to expedite the learning of task-specific behavior by leveraging a hierarchy of causal effects. We propose CEHRL, a hierarchical method that models the distribution of controllable effects using a Variational Autoencoder. In comparison to exploring with random actions, experimental results show that random effect exploration is a more efficient mechanism.
arXiv Detail & Related papers (2020-10-03T13:19:16Z)
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events [127.82594819117753]
We propose a novel type of intrinsic motivation for Reinforcement Learning (RL) that encourages the agent to understand the causal effect of its actions. We train a neural network to predict the auditory events and use the prediction errors as intrinsic rewards to guide RL exploration. Experimental results on Atari games show that our new intrinsic motivation significantly outperforms several state-of-the-art baselines.
arXiv Detail & Related papers (2020-07-27T17:59:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.