The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning
- URL: http://arxiv.org/abs/2201.09653v1
- Date: Mon, 24 Jan 2022 13:18:02 GMT
- Title: The Paradox of Choice: Using Attention in Hierarchical Reinforcement
Learning
- Authors: Andrei Nica, Khimya Khetarpal, Doina Precup
- Abstract summary: We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options.
We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices.
- Score: 59.777127897688594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision-making AI agents are often faced with two important challenges: the
depth of the planning horizon, and the branching factor due to having many
choices. Hierarchical reinforcement learning methods aim to solve the first
problem, by providing shortcuts that skip over multiple time steps. To cope
with the breadth, it is desirable to restrict the agent's attention at each
step to a reasonable number of possible choices. The concept of affordances
(Gibson, 1977) suggests that only certain actions are feasible in certain
states. In this work, we model "affordances" through an attention mechanism
that limits the available choices of temporally extended options. We present an
online, model-free algorithm to learn affordances that can be used to further
learn subgoal options. We investigate the role of hard versus soft attention in
training data collection, abstract value learning in long-horizon tasks, and
handling a growing number of choices. We identify and empirically illustrate
the settings in which the paradox of choice arises, i.e. when having fewer but
more meaningful choices improves the learning speed and performance of a
reinforcement learning agent.
Related papers
- Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints [34.154704060947246]
We study online learning in episodic constrained Markov decision processes (CMDPs)
We provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints.
arXiv Detail & Related papers (2023-04-27T16:58:29Z) - Reusable Options through Gradient-based Meta Learning [24.59017394648942]
Several deep learning approaches were proposed to learn temporal abstractions in the form of options in an end-to-end manner.
We frame the problem of learning options as a gradient-based meta-learning problem.
We show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods.
arXiv Detail & Related papers (2022-12-22T14:19:35Z) - Exploring with Sticky Mittens: Reinforcement Learning with Expert
Interventions via Option Templates [31.836234758355243]
We propose a framework for leveraging expert intervention to solve long-horizon reinforcement learning tasks.
We consider option templates, which are specifications encoding a potential option that can be trained using reinforcement learning.
We evaluate our approach on three challenging reinforcement learning problems, showing that it outperforms state-the-art approaches by an order of magnitude.
arXiv Detail & Related papers (2022-02-25T20:55:34Z) - GrASP: Gradient-Based Affordance Selection for Planning [25.548880832898757]
Planning with a learned model is arguably a key component of intelligence.
We present a method for selecting affordances useful for planning.
We show that it is feasible to learn to select both primitive-action and option affordances.
arXiv Detail & Related papers (2022-02-08T03:24:36Z) - Attention Option-Critic [56.50123642237106]
We propose an attention-based extension to the option-critic framework.
We show that this leads to behaviorally diverse options which are also capable of state abstraction.
We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic.
arXiv Detail & Related papers (2022-01-07T18:44:28Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Diversity-Enriched Option-Critic [47.82697599507171]
We show that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks.
Our approach generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
arXiv Detail & Related papers (2020-11-04T22:12:54Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.