Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving
- URL: http://arxiv.org/abs/2001.02122v1
- Date: Sat, 4 Jan 2020 17:53:28 GMT
- Title: Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving
- Authors: Christoph Gebhardt, Antti Oulasvirta, Otmar Hilliges
- Abstract summary: We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
- Score: 60.95424607008241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do people decide how long to continue in a task, when to switch, and to
which other task? Understanding the mechanisms that underpin task interleaving
is a long-standing goal in the cognitive sciences. Prior work suggests greedy
heuristics and a policy maximizing the marginal rate of return. However, it is
unclear how such a strategy would allow for adaptation to everyday environments
that offer multiple tasks with complex switch costs and delayed rewards. Here
we develop a hierarchical model of supervisory control driven by reinforcement
learning (RL). The supervisory level learns to switch using task-specific
approximate utility estimates, which are computed on the lower level. A
hierarchically optimal value function decomposition can be learned from
experience, even in conditions with multiple tasks and arbitrary and uncertain
reward and cost structures. The model reproduces known empirical effects of
task interleaving. It yields better predictions of individual-level data than a
myopic baseline in a six-task problem (N=211). The results support hierarchical
RL as a plausible model of task interleaving.
Related papers
- Identifying Selections for Unsupervised Subtask Discovery [12.22188797558089]
We provide a theory to identify, and experiments to verify the existence of selection variables in data.
These selections serve as subgoals that indicate subtasks and guide policy.
In light of this idea, we develop a sequential non-negative matrix factorization (seq- NMF) method to learn these subgoals and extract meaningful behavior patterns as subtasks.
arXiv Detail & Related papers (2024-10-28T23:47:43Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - POMRL: No-Regret Learning-to-Plan with Increasing Horizons [43.693739167594295]
We study the problem of planning under model uncertainty in an online meta-reinforcement learning setting.
We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss.
arXiv Detail & Related papers (2022-12-30T03:09:45Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Conflict-Averse Gradient Descent for Multi-task Learning [56.379937772617]
A major challenge in optimizing a multi-task model is the conflicting gradients.
We introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function.
CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss.
arXiv Detail & Related papers (2021-10-26T22:03:51Z) - Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks.
We let the training process dictate the optimal weighting of tasks for every instance in the dataset.
We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z) - Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration.
OMPN can be applied to partially observable environments and still achieve higher task decomposition performance.
Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z) - Hierarchical Reinforcement Learning By Discovering Intrinsic Options [18.041140234312934]
HIDIO can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks.
In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency.
arXiv Detail & Related papers (2021-01-16T20:54:31Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.