A Combinatorial Perspective on Transfer Learning
- URL: http://arxiv.org/abs/2010.12268v1
- Date: Fri, 23 Oct 2020 09:53:31 GMT
- Title: A Combinatorial Perspective on Transfer Learning
- Authors: Jianan Wang, Eren Sezener, David Budden, Marcus Hutter, Joel Veness
- Abstract summary: We study how the learning of modular solutions can allow for effective generalization to both unseen and potentially differently distributed data.
Our main postulate is that the combination of task segmentation, modular learning and memory-based ensembling can give rise to generalization on an exponentially growing number of unseen tasks.
- Score: 27.7848044115664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human intelligence is characterized not only by the capacity to learn complex
skills, but the ability to rapidly adapt and acquire new skills within an
ever-changing environment. In this work we study how the learning of modular
solutions can allow for effective generalization to both unseen and potentially
differently distributed data. Our main postulate is that the combination of
task segmentation, modular learning and memory-based ensembling can give rise
to generalization on an exponentially growing number of unseen tasks. We
provide a concrete instantiation of this idea using a combination of: (1) the
Forget-Me-Not Process, for task segmentation and memory based ensembling; and
(2) Gated Linear Networks, which in contrast to contemporary deep learning
techniques use a modular and local learning mechanism. We demonstrate that this
system exhibits a number of desirable continual learning properties: robustness
to catastrophic forgetting, no negative transfer and increasing levels of
positive transfer as more tasks are seen. We show competitive performance
against both offline and online methods on standard continual learning
benchmarks.
Related papers
- A Unified Framework for Continual Learning and Machine Unlearning [9.538733681436836]
Continual learning and machine unlearning are crucial challenges in machine learning, typically addressed separately.
We introduce a novel framework that jointly tackles both tasks by leveraging controlled knowledge distillation.
Our approach enables efficient learning with minimal forgetting and effective targeted unlearning.
arXiv Detail & Related papers (2024-08-21T06:49:59Z) - Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes.
To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution.
Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z) - SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [71.78800549517298]
Continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world.
Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input.
We propose a novel Shared Attention Framework (SAPT) to align the PET learning and selection via the Shared Attentive Learning & Selection module.
arXiv Detail & Related papers (2024-01-16T11:45:03Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Class-Incremental Learning via Knowledge Amalgamation [14.513858688486701]
Catastrophic forgetting has been a significant problem hindering the deployment of deep learning algorithms in the continual learning setting.
We put forward an alternative strategy to handle the catastrophic forgetting with knowledge amalgamation (CFA)
CFA learns a student network from multiple heterogeneous teacher models specializing in previous tasks and can be applied to current offline methods.
arXiv Detail & Related papers (2022-09-05T19:49:01Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Essentials for Class Incremental Learning [43.306374557919646]
Class-incremental learning results on CIFAR-100 and ImageNet improve over the state-of-the-art by a large margin, while keeping the approach simple.
arXiv Detail & Related papers (2021-02-18T18:01:06Z) - Bilevel Continual Learning [76.50127663309604]
We present a novel framework of continual learning named "Bilevel Continual Learning" (BCL)
Our experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods.
arXiv Detail & Related papers (2020-07-30T16:00:23Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.