Mechanism of feature learning in deep fully connected networks and
kernel machines that recursively learn features
- URL: http://arxiv.org/abs/2212.13881v3
- Date: Tue, 9 May 2023 14:29:35 GMT
- Title: Mechanism of feature learning in deep fully connected networks and
kernel machines that recursively learn features
- Authors: Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit,
Mikhail Belkin
- Abstract summary: We identify and characterize the mechanism through which deep fully connected neural networks learn gradient features.
Our ansatz sheds light on various deep learning phenomena including emergence of spurious features and simplicity biases.
To demonstrate the effectiveness of this feature learning mechanism, we use it to enable feature learning in classical, non-feature learning models.
- Score: 15.29093374895364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years neural networks have achieved impressive results on many
technological and scientific tasks. Yet, the mechanism through which these
models automatically select features, or patterns in data, for prediction
remains unclear. Identifying such a mechanism is key to advancing performance
and interpretability of neural networks and promoting reliable adoption of
these models in scientific applications. In this paper, we identify and
characterize the mechanism through which deep fully connected neural networks
learn features. We posit the Deep Neural Feature Ansatz, which states that
neural feature learning occurs by implementing the average gradient outer
product to up-weight features strongly related to model output. Our ansatz
sheds light on various deep learning phenomena including emergence of spurious
features and simplicity biases and how pruning networks can increase
performance, the "lottery ticket hypothesis." Moreover, the mechanism
identified in our work leads to a backpropagation-free method for feature
learning with any machine learning model. To demonstrate the effectiveness of
this feature learning mechanism, we use it to enable feature learning in
classical, non-feature learning models known as kernel machines and show that
the resulting models, which we refer to as Recursive Feature Machines, achieve
state-of-the-art performance on tabular data.
Related papers
- High-dimensional learning of narrow neural networks [1.7094064195431147]
This manuscript reviews the tools and ideas underlying recent progress in machine learning.
We introduce a generic model -- the sequence multi-index model -- which encompasses numerous previously studied models as special instances.
We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms.
arXiv Detail & Related papers (2024-09-20T21:20:04Z) - Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning [38.09011520275557]
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones.
We propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL.
arXiv Detail & Related papers (2024-06-04T15:47:03Z) - Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach [50.36650300087987]
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism.
We have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge.
arXiv Detail & Related papers (2024-03-27T05:10:38Z) - Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences.
It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations.
Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
arXiv Detail & Related papers (2024-02-20T15:23:24Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Unraveling Feature Extraction Mechanisms in Neural Networks [10.13842157577026]
We propose a theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such mechanisms.
We reveal how these models leverage statistical features during gradient descent and how they are integrated into final decisions.
We find that while self-attention and CNN models may exhibit limitations in learning n-grams, multiplication-based models seem to excel in this area.
arXiv Detail & Related papers (2023-10-25T04:22:40Z) - Feature Chirality in Deep Learning Models [7.402957682300806]
We study feature chirality innovatively, which shows how the statistics of deep learning models' feature data are changed by training.
Our work shows that feature chirality implies model evaluation, interpretability of the model, and model parameters optimization.
arXiv Detail & Related papers (2023-05-06T07:57:38Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.