Meta-learning how to Share Credit among Macro-Actions
- URL: http://arxiv.org/abs/2506.13690v1
- Date: Mon, 16 Jun 2025 16:52:49 GMT
- Title: Meta-learning how to Share Credit among Macro-Actions
- Authors: Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu,
- Abstract summary: We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space.<n>We propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism.<n>Our results show significant improvements over the Rainbow-DQN baseline in all environments.
- Score: 15.3064603135039
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many scenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environment-specific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy. We empirically validate our strategy looking at macro-actions in Atari games, and the StreetFighter II environment. Our results show significant improvements over the Rainbow-DQN baseline in all environments. Additionally, we show that the macro-action similarity is transferable to related environments. We believe this work is a small but important step towards understanding how the similarity-imposed geometry on the action space can be exploited to improve credit assignment and exploration, therefore making learning more effective.
Related papers
- Hierarchical Meta-Reinforcement Learning via Automated Macro-Action Discovery [4.0847743592744905]
It is still challenging to learn performant policies across multiple complex and high-dimensional tasks.<n>We propose a novel architecture with three hierarchical levels for 1) learning task representations, 2) discovering task-agnostic macro-actions in an automated manner, and 3) learning primitive actions.
arXiv Detail & Related papers (2024-12-16T16:15:36Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - No Prior Mask: Eliminate Redundant Action for Deep Reinforcement
Learning [13.341525656639583]
Large action space is one fundamental obstacle to deploying Reinforcement Learning methods in the real world.
We propose a novel redundant action filtering mechanism named No Prior Mask (NPM)
arXiv Detail & Related papers (2023-12-11T09:56:02Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Endogenous Macrodynamics in Algorithmic Recourse [52.87956177581998]
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment.
We show that many of the existing methodologies can be collectively described by a generalized framework.
We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level.
arXiv Detail & Related papers (2023-08-16T07:36:58Z) - Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - RODE: Learning Roles to Decompose Multi-Agent Tasks [69.56458960841165]
Role-based learning holds the promise of achieving scalable multi-agent learning by decomposing complex tasks using roles.
We propose to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents.
By virtue of these advances, our method outperforms the current state-of-the-art MARL algorithms on 10 of the 14 scenarios that comprise the challenging StarCraft II micromanagement benchmark.
arXiv Detail & Related papers (2020-10-04T09:20:59Z) - Efficient Black-Box Planning Using Macro-Actions with Focused Effects [35.688161278362735]
Heuristics can make search more efficient, but goal-awares for black-box planning.
We show how to overcome this limitation by discovering macro-actions that make the goal-count more accurate.
arXiv Detail & Related papers (2020-04-28T02:13:12Z) - Macro-Action-Based Deep Multi-Agent Reinforcement Learning [17.73081797556005]
This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions.
Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions.
arXiv Detail & Related papers (2020-04-18T15:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.