Related papers: Instilling Inductive Biases with Subnetworks

Instilling Inductive Biases with Subnetworks

URL: http://arxiv.org/abs/2310.10899v2
Date: Thu, 1 Feb 2024 00:05:51 GMT
Title: Instilling Inductive Biases with Subnetworks
Authors: Enyan Zhang, Michael A. Lepori, Ellie Pavlick
Abstract summary: Subtask Induction instills inductive biases towards solutions utilizing a subtask. We show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution.
Score: 19.444844580405594
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite the recent success of artificial neural networks on a variety of tasks, we have little knowledge or control over the exact solutions these models implement. Instilling inductive biases -- preferences for some solutions over others -- into these models is one promising path toward understanding and controlling their behavior. Much work has been done to study the inherent inductive biases of models and instill different inductive biases through hand-designed architectures or carefully curated training regimens. In this work, we explore a more mechanistic approach: Subtask Induction. Our method discovers a functional subnetwork that implements a particular subtask within a trained model and uses it to instill inductive biases towards solutions utilizing that subtask. Subtask Induction is flexible and efficient, and we demonstrate its effectiveness with two experiments. First, we show that Subtask Induction significantly reduces the amount of training data required for a model to adopt a specific, generalizable solution to a modular arithmetic task. Second, we demonstrate that Subtask Induction successfully induces a human-like shape bias while increasing data efficiency for convolutional and transformer-based image classification models.

Related papers

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors. This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks. We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z)
VaiBot: Shuttle Between the Instructions and Parameters of Large Language Models [22.676819780878198]
This paper proposes a neural network framework, VaiBot, that integrates VAE and VIB, designed to uniformly model, learn, and infer both deduction and induction tasks. We show that VaiBot performs on par with existing baseline methods in terms of deductive capabilities while significantly surpassing them in inductive capabilities.
arXiv Detail & Related papers (2025-02-04T13:36:54Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z)
On the Inductive Bias of Stacking Towards Improving Reasoning [50.225873619537765]
We propose a variant of gradual stacking called MIDAS that can speed up language model training by up to 40%. MIDAS is not only training-efficient but surprisingly also has an inductive bias towards improving downstream tasks. We conjecture the underlying reason for this inductive bias by exploring the connection of stacking to looped models.
arXiv Detail & Related papers (2024-09-27T17:58:21Z)
Towards Exact Computation of Inductive Bias [8.988109761916379]
We propose a novel method for efficiently computing the inductive bias required for generalization on a task. We show that higher dimensional tasks require greater inductive bias. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures.
arXiv Detail & Related papers (2024-06-22T21:14:24Z)
Dreamguider: Improved Training free Diffusion-based Conditional Generation [31.68823843900196]
Dreamguider is a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network. We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules.
arXiv Detail & Related papers (2024-06-04T17:59:32Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning [30.610670366488943]
We replace architecture engineering by encoding inductive bias in datasets. Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities. Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks.
arXiv Detail & Related papers (2021-01-15T17:15:24Z)
Transferring Inductive Biases through Knowledge Distillation [21.219305008067735]
We explore the power of knowledge distillation for transferring the effect of inductive biases from one model to another. We study the effect of inductive biases on the solutions the models converge to and investigate how and to what extent the effect of inductive biases is transferred through knowledge distillation.
arXiv Detail & Related papers (2020-05-31T16:34:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.