Instilling Inductive Biases with Subnetworks
- URL: http://arxiv.org/abs/2310.10899v2
- Date: Thu, 1 Feb 2024 00:05:51 GMT
- Title: Instilling Inductive Biases with Subnetworks
- Authors: Enyan Zhang, Michael A. Lepori, Ellie Pavlick
- Abstract summary: Subtask Induction instills inductive biases towards solutions utilizing a subtask.
We show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution.
- Score: 19.444844580405594
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the recent success of artificial neural networks on a variety of
tasks, we have little knowledge or control over the exact solutions these
models implement. Instilling inductive biases -- preferences for some solutions
over others -- into these models is one promising path toward understanding and
controlling their behavior. Much work has been done to study the inherent
inductive biases of models and instill different inductive biases through
hand-designed architectures or carefully curated training regimens. In this
work, we explore a more mechanistic approach: Subtask Induction. Our method
discovers a functional subnetwork that implements a particular subtask within a
trained model and uses it to instill inductive biases towards solutions
utilizing that subtask. Subtask Induction is flexible and efficient, and we
demonstrate its effectiveness with two experiments. First, we show that Subtask
Induction significantly reduces the amount of training data required for a
model to adopt a specific, generalizable solution to a modular arithmetic task.
Second, we demonstrate that Subtask Induction successfully induces a human-like
shape bias while increasing data efficiency for convolutional and
transformer-based image classification models.
Related papers
- Towards Exact Computation of Inductive Bias [8.988109761916379]
We propose a novel method for efficiently computing the inductive bias required for generalization on a task.
We show that higher dimensional tasks require greater inductive bias.
Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures.
arXiv Detail & Related papers (2024-06-22T21:14:24Z) - Dreamguider: Improved Training free Diffusion-based Conditional Generation [31.68823843900196]
Dreamguider is a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network.
We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules.
arXiv Detail & Related papers (2024-06-04T17:59:32Z) - Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning [52.70210390424605]
In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature.
In practice, however, naively combining existing techniques instantiating these inductive biases fails to yield significant benefits.
We propose adaptations to the three techniques that simplify the learning problem, equip key regularization terms with stabilizing invariances, and quash degenerate incentives.
The resulting model, Tripod, achieves state-of-the-art results on a suite of four image disentanglement benchmarks.
arXiv Detail & Related papers (2024-04-16T04:52:41Z) - Distilling Symbolic Priors for Concept Learning into Neural Networks [9.915299875869046]
We show that inductive biases can be instantiated in artificial neural networks by distilling a prior distribution from a symbolic Bayesian model via meta-learning.
We use this approach to create a neural network with an inductive bias towards concepts expressed as short logical formulas.
arXiv Detail & Related papers (2024-02-10T20:06:26Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning [30.610670366488943]
We replace architecture engineering by encoding inductive bias in datasets.
Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities.
Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks.
arXiv Detail & Related papers (2021-01-15T17:15:24Z) - Transferring Inductive Biases through Knowledge Distillation [21.219305008067735]
We explore the power of knowledge distillation for transferring the effect of inductive biases from one model to another.
We study the effect of inductive biases on the solutions the models converge to and investigate how and to what extent the effect of inductive biases is transferred through knowledge distillation.
arXiv Detail & Related papers (2020-05-31T16:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.