Disentangling causal effects for hierarchical reinforcement learning
- URL: http://arxiv.org/abs/2010.01351v2
- Date: Mon, 21 Feb 2022 19:18:09 GMT
- Title: Disentangling causal effects for hierarchical reinforcement learning
- Authors: Oriol Corcoll and Raul Vicente
- Abstract summary: This study aims to expedite the learning of task-specific behavior by leveraging a hierarchy of causal effects.
We propose CEHRL, a hierarchical method that models the distribution of controllable effects using a Variational Autoencoder.
In comparison to exploring with random actions, experimental results show that random effect exploration is a more efficient mechanism.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploration and credit assignment under sparse rewards are still challenging
problems. We argue that these challenges arise in part due to the intrinsic
rigidity of operating at the level of actions. Actions can precisely define how
to perform an activity but are ill-suited to describe what activity to perform.
Instead, causal effects are inherently composable and temporally abstract,
making them ideal for descriptive tasks. By leveraging a hierarchy of causal
effects, this study aims to expedite the learning of task-specific behavior and
aid exploration. Borrowing counterfactual and normality measures from causal
literature, we disentangle controllable effects from effects caused by other
dynamics of the environment. We propose CEHRL, a hierarchical method that
models the distribution of controllable effects using a Variational
Autoencoder. This distribution is used by a high-level policy to 1) explore the
environment via random effect exploration so that novel effects are
continuously discovered and learned, and to 2) learn task-specific behavior by
prioritizing the effects that maximize a given reward function. In comparison
to exploring with random actions, experimental results show that random effect
exploration is a more efficient mechanism and that by assigning credit to few
effects rather than many actions, CEHRL learns tasks more rapidly.
Related papers
- Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - Fast Proxy Experiment Design for Causal Effect Identification [27.885243535456237]
Two approaches to estimate causal effects are observational and experimental (randomized) studies.
Direct experiments on the target variable may be too costly or even infeasible to conduct.
A proxy experiment is conducted on variables with a lower cost to intervene on compared to the main target.
arXiv Detail & Related papers (2024-07-07T11:09:38Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Understanding reinforcement learned crowds [9.358303424584902]
Reinforcement Learning methods are used to animate virtual agents.
It is not obvious what is their real impact, and how they affect the results.
We analyze some of these arbitrary choices in terms of their impact on the learning performance.
arXiv Detail & Related papers (2022-09-19T20:47:49Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Causal Curiosity: RL Agents Discovering Self-supervised Experiments for
Causal Representation Learning [24.163616087447874]
We introduce em causal curiosity, a novel intrinsic reward.
We show that it allows our agents to learn optimal sequences of actions.
We also show that the knowledge of causal factor representations aids zero-shot learning for more complex tasks.
arXiv Detail & Related papers (2020-10-07T02:07:51Z) - RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles.
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents.
By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.