POMRL: No-Regret Learning-to-Plan with Increasing Horizons
- URL: http://arxiv.org/abs/2212.14530v1
- Date: Fri, 30 Dec 2022 03:09:45 GMT
- Title: POMRL: No-Regret Learning-to-Plan with Increasing Horizons
- Authors: Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh,
Tom Zahavy
- Abstract summary: We study the problem of planning under model uncertainty in an online meta-reinforcement learning setting.
We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss.
- Score: 43.693739167594295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of planning under model uncertainty in an online
meta-reinforcement learning (RL) setting where an agent is presented with a
sequence of related tasks with limited interactions per task. The agent can use
its experience in each task and across tasks to estimate both the transition
model and the distribution over tasks. We propose an algorithm to meta-learn
the underlying structure across tasks, utilize it to plan in each task, and
upper-bound the regret of the planning loss. Our bound suggests that the
average regret over tasks decreases as the number of tasks increases and as the
tasks are more similar. In the classical single-task setting, it is known that
the planning horizon should depend on the estimated model's accuracy, that is,
on the number of samples within task. We generalize this finding to meta-RL and
study this dependence of planning horizons on the number of tasks. Based on our
theoretical findings, we derive heuristics for selecting slowly increasing
discount factors, and we validate its significance empirically.
Related papers
- Generalization of Compositional Tasks with Logical Specification via Implicit Planning [14.46490764849977]
We introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization.
At the high level, we present an implicit planner specifically designed for generalizing compositional tasks.
It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN)
arXiv Detail & Related papers (2024-10-13T00:57:10Z) - Proximal Curriculum with Task Correlations for Deep Reinforcement Learning [25.10619062353793]
We consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks.
We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's learning toward the target distribution via leveraging task correlations.
arXiv Detail & Related papers (2024-05-03T21:07:54Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Meta-learning with an Adaptive Task Scheduler [93.63502984214918]
Existing meta-learning algorithms randomly sample meta-training tasks with a uniform probability.
It is likely that tasks are detrimental with noise or imbalanced given a limited number of meta-training tasks.
We propose an adaptive task scheduler (ATS) for the meta-training process.
arXiv Detail & Related papers (2021-10-26T22:16:35Z) - Task Scoping: Generating Task-Specific Abstractions for Planning [19.411900372400183]
Planning to solve any specific task using an open-scope world model is computationally intractable.
We propose task scoping: a method that exploits knowledge of the initial condition, goal condition, and transition-dynamics structure of a task.
We prove that task scoping never deletes relevant factors or actions, characterize its computational complexity, and characterize the planning problems for which it is especially useful.
arXiv Detail & Related papers (2020-10-17T21:19:25Z) - Deeper Task-Specificity Improves Joint Entity and Relation Extraction [0.0]
Multi-task learning (MTL) is an effective method for learning related tasks, but designing MTL models requires deciding which and how many parameters should be task-specific.
We propose a novel neural architecture that allows for deeper task-specificity than does prior work.
arXiv Detail & Related papers (2020-02-15T18:34:52Z) - Hierarchical Reinforcement Learning as a Model of Human Task
Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning.
The model reproduces known empirical effects of task interleaving.
The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.