Instilling Inductive Biases with Subnetworks
- URL: http://arxiv.org/abs/2310.10899v2
- Date: Thu, 1 Feb 2024 00:05:51 GMT
- Title: Instilling Inductive Biases with Subnetworks
- Authors: Enyan Zhang, Michael A. Lepori, Ellie Pavlick
- Abstract summary: Subtask Induction instills inductive biases towards solutions utilizing a subtask.
We show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution.
- Score: 19.444844580405594
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the recent success of artificial neural networks on a variety of
tasks, we have little knowledge or control over the exact solutions these
models implement. Instilling inductive biases -- preferences for some solutions
over others -- into these models is one promising path toward understanding and
controlling their behavior. Much work has been done to study the inherent
inductive biases of models and instill different inductive biases through
hand-designed architectures or carefully curated training regimens. In this
work, we explore a more mechanistic approach: Subtask Induction. Our method
discovers a functional subnetwork that implements a particular subtask within a
trained model and uses it to instill inductive biases towards solutions
utilizing that subtask. Subtask Induction is flexible and efficient, and we
demonstrate its effectiveness with two experiments. First, we show that Subtask
Induction significantly reduces the amount of training data required for a
model to adopt a specific, generalizable solution to a modular arithmetic task.
Second, we demonstrate that Subtask Induction successfully induces a human-like
shape bias while increasing data efficiency for convolutional and
transformer-based image classification models.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable.
Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration.
We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - On the Inductive Bias of Stacking Towards Improving Reasoning [50.225873619537765]
We propose a variant of gradual stacking called MIDAS that can speed up language model training by up to 40%.
MIDAS is not only training-efficient but surprisingly also has an inductive bias towards improving downstream tasks.
We conjecture the underlying reason for this inductive bias by exploring the connection of stacking to looped models.
arXiv Detail & Related papers (2024-09-27T17:58:21Z) - Towards Exact Computation of Inductive Bias [8.988109761916379]
We propose a novel method for efficiently computing the inductive bias required for generalization on a task.
We show that higher dimensional tasks require greater inductive bias.
Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures.
arXiv Detail & Related papers (2024-06-22T21:14:24Z) - Dreamguider: Improved Training free Diffusion-based Conditional Generation [31.68823843900196]
Dreamguider is a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network.
We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules.
arXiv Detail & Related papers (2024-06-04T17:59:32Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning [30.610670366488943]
We replace architecture engineering by encoding inductive bias in datasets.
Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities.
Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks.
arXiv Detail & Related papers (2021-01-15T17:15:24Z) - Transferring Inductive Biases through Knowledge Distillation [21.219305008067735]
We explore the power of knowledge distillation for transferring the effect of inductive biases from one model to another.
We study the effect of inductive biases on the solutions the models converge to and investigate how and to what extent the effect of inductive biases is transferred through knowledge distillation.
arXiv Detail & Related papers (2020-05-31T16:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.