SplAgger: Split Aggregation for Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2403.03020v3
- Date: Sat, 1 Jun 2024 22:35:29 GMT
- Title: SplAgger: Split Aggregation for Meta-Reinforcement Learning
- Authors: Jacob Beck, Matthew Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson,
- Abstract summary: Black box methods do so by training off-the-shelf sequence models end-to-end.
task inference methods explicitly infer a posterior distribution over the unknown task.
Recent work has shown that task inference sequence models are not necessary for strong performance.
We present evidence that task inference sequence models are indeed still beneficial.
- Score: 32.25672143072966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A core ambition of reinforcement learning (RL) is the creation of agents capable of rapid learning in novel tasks. Meta-RL aims to achieve this by directly learning such agents. Black box methods do so by training off-the-shelf sequence models end-to-end. By contrast, task inference methods explicitly infer a posterior distribution over the unknown task, typically using distinct objectives and sequence models designed to enable task inference. Recent work has shown that task inference methods are not necessary for strong performance. However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not. In this paper, we present evidence that task inference sequence models are indeed still beneficial. In particular, we investigate sequence models with permutation invariant aggregation, which exploit the fact that, due to the Markov property, the task posterior does not depend on the order of data. We empirically confirm the advantage of permutation invariant sequence models without the use of task inference objectives. However, we also find, surprisingly, that there are multiple conditions under which permutation variance remains useful. Therefore, we propose SplAgger, which uses both permutation variant and invariant components to achieve the best of both worlds, outperforming all baselines evaluated on continuous control and memory environments. Code is provided at https://github.com/jacooba/hyper.
Related papers
- Task-recency bias strikes back: Adapting covariances in Exemplar-Free Class Incremental Learning [0.3281128493853064]
We tackle the problem of training a model on a sequence of tasks without access to past data.
Existing methods represent classes as Gaussian distributions in the feature extractor's latent space.
We propose AdaGauss -- a novel method that adapts covariance matrices from task to task.
arXiv Detail & Related papers (2024-09-26T20:18:14Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - MetaKernel: Learning Variational Random Features with Limited Labels [120.90737681252594]
Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks.
We propose meta-learning kernels with random Fourier features for few-shot learning, we call Meta Kernel.
arXiv Detail & Related papers (2021-05-08T21:24:09Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - Lifelong Learning Without a Task Oracle [13.331659934508764]
Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned.
We propose and compare several candidate task-assigning mappers which require very little memory overhead.
Best-performing variants only impose an average cost of 1.7% parameter memory increase.
arXiv Detail & Related papers (2020-11-09T21:30:31Z) - A Markov Decision Process Approach to Active Meta Learning [24.50189361694407]
In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task.
In meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously.
arXiv Detail & Related papers (2020-09-10T15:45:34Z) - Task-similarity Aware Meta-learning through Nonparametric Kernel
Regression [8.801367758434335]
This paper investigates the use of nonparametric kernel-regression to obtain a tasksimilarity aware meta-learning algorithm.
Our hypothesis is that the use of tasksimilarity helps meta-learning when the available tasks are limited and may contain outlier/ dissimilar tasks.
arXiv Detail & Related papers (2020-06-12T14:15:11Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.