Factorizing Knowledge in Neural Networks
- URL: http://arxiv.org/abs/2207.03337v1
- Date: Mon, 4 Jul 2022 09:56:49 GMT
- Title: Factorizing Knowledge in Neural Networks
- Authors: Xingyi Yang, Jingwen Ye, Xinchao Wang
- Abstract summary: We propose a novel knowledge-transfer task, Knowledge Factorization(KF)
KF aims to decompose it into several factor networks, each of which handles only a dedicated task and maintains task-specific knowledge factorized from the source network.
We introduce an information-theoretic objective, InfoMax-Bottleneck(IMB), to carry out KF by optimizing the mutual information between the learned representations and input.
- Score: 65.57381498391202
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore a novel and ambitious knowledge-transfer task,
termed Knowledge Factorization~(KF). The core idea of KF lies in the
modularization and assemblability of knowledge: given a pretrained network
model as input, KF aims to decompose it into several factor networks, each of
which handles only a dedicated task and maintains task-specific knowledge
factorized from the source network. Such factor networks are task-wise
disentangled and can be directly assembled, without any fine-tuning, to produce
the more competent combined-task networks. In other words, the factor networks
serve as Lego-brick-like building blocks, allowing us to construct customized
networks in a plug-and-play manner. Specifically, each factor network comprises
two modules, a common-knowledge module that is task-agnostic and shared by all
factor networks, alongside with a task-specific module dedicated to the factor
network itself. We introduce an information-theoretic objective,
InfoMax-Bottleneck~(IMB), to carry out KF by optimizing the mutual information
between the learned representations and input. Experiments across various
benchmarks demonstrate that, the derived factor networks yield gratifying
performances on not only the dedicated tasks but also disentanglement, while
enjoying much better interpretability and modularity. Moreover, the learned
common-knowledge representations give rise to impressive results on transfer
learning.
Related papers
- A Unified Causal View of Instruction Tuning [76.1000380429553]
We develop a meta Structural Causal Model (meta-SCM) to integrate different NLP tasks under a single causal structure of the data.
Key idea is to learn task-required causal factors and only use those to make predictions for a given task.
arXiv Detail & Related papers (2024-02-09T07:12:56Z) - Investigating the Impact of Weight Sharing Decisions on Knowledge
Transfer in Continual Learning [7.25130576615102]
Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks.
This paper investigates how different sharing decisions affect the Forward Knowledge Transfer (FKT) between tasks.
arXiv Detail & Related papers (2023-11-16T02:06:23Z) - Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence Function [0.0]
This article introduces a novel approach that modifies Cycle Generative Adversarial Networks specifically for reinforcement learning.
Our method achieves 100% knowledge transfer in identical tasks and either 100% knowledge transfer or a 30% reduction in training time for a rotated task.
arXiv Detail & Related papers (2022-09-14T12:42:59Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Distributed Learning for Time-varying Networks: A Scalable Design [13.657740129012804]
We propose a distributed learning framework based on a scalable deep neural network (DNN) design.
By exploiting the permutation equivalence and invariance properties of the learning tasks, the DNNs with different scales for different clients can be built up.
Model aggregation can also be conducted based on these two sub-matrices to improve the learning convergence and performance.
arXiv Detail & Related papers (2021-07-31T12:44:28Z) - Efficient Transfer Learning via Joint Adaptation of Network Architecture
and Weight [66.8543732597723]
Recent worksin neural architecture search (NAS) can aid transfer learning by establishing sufficient network search space.
We propose a novel framework consisting of two modules, the neural architecturesearch module for architecture transfer and the neural weight search module for weight transfer.
These two modules conduct search on thetarget task based on a reduced super-networks, so we only need to trainonce on the source task.
arXiv Detail & Related papers (2021-05-19T08:58:04Z) - Entangled q-Convolutional Neural Nets [0.0]
We introduce a machine learning model, the q-CNN model, sharing key features with convolutional neural networks and admitting a tensor network description.
As examples, we apply q-CNN to the MNIST and Fashion MNIST classification tasks.
We explain how the network associates a quantum state to each classification label, and study the entanglement structure of these network states.
arXiv Detail & Related papers (2021-03-06T02:35:52Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z) - Learning credit assignment [2.0711789781518752]
It is unknown how the learning coordinates a huge number of parameters to achieve a decision making.
We propose a mean-field learning model by assuming that an ensemble of sub-networks are trained for a classification task.
Our model learns the credit assignment leading to the decision, and predicts an ensemble of sub-networks that can accomplish the same task.
arXiv Detail & Related papers (2020-01-10T09:06:46Z) - Automated Relational Meta-learning [95.02216511235191]
We propose an automated relational meta-learning framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph.
We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines.
arXiv Detail & Related papers (2020-01-03T07:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.