Unraveling Feature Extraction Mechanisms in Neural Networks
- URL: http://arxiv.org/abs/2310.16350v2
- Date: Thu, 26 Oct 2023 03:26:30 GMT
- Title: Unraveling Feature Extraction Mechanisms in Neural Networks
- Authors: Xiaobing Sun, Jiaxi Li, Wei Lu
- Abstract summary: We propose a theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such mechanisms.
We reveal how these models leverage statistical features during gradient descent and how they are integrated into final decisions.
We find that while self-attention and CNN models may exhibit limitations in learning n-grams, multiplication-based models seem to excel in this area.
- Score: 10.13842157577026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The underlying mechanism of neural networks in capturing precise knowledge
has been the subject of consistent research efforts. In this work, we propose a
theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such
mechanisms. Specifically, considering the infinite network width, we
hypothesize the learning dynamics of target models may intuitively unravel the
features they acquire from training data, deepening our insights into their
internal mechanisms. We apply our approach to several fundamental models and
reveal how these models leverage statistical features during gradient descent
and how they are integrated into final decisions. We also discovered that the
choice of activation function can affect feature extraction. For instance, the
use of the \textit{ReLU} activation function could potentially introduce a bias
in features, providing a plausible explanation for its replacement with
alternative functions in recent pre-trained language models. Additionally, we
find that while self-attention and CNN models may exhibit limitations in
learning n-grams, multiplication-based models seem to excel in this area. We
verify these theoretical findings through experiments and find that they can be
applied to analyze language modeling tasks, which can be regarded as a special
variant of classification. Our contributions offer insights into the roles and
capacities of fundamental components within large language models, thereby
aiding the broader understanding of these complex systems.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Demolition and Reinforcement of Memories in Spin-Glass-like Neural
Networks [0.0]
The aim of this thesis is to understand the effectiveness of Unlearning in both associative memory models and generative models.
The selection of structured data enables an associative memory model to retrieve concepts as attractors of a neural dynamics with considerable basins of attraction.
A novel regularization technique for Boltzmann Machines is presented, proving to outperform previously developed methods in learning hidden probability distributions from data-sets.
arXiv Detail & Related papers (2024-03-04T23:12:42Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Mechanism of feature learning in deep fully connected networks and
kernel machines that recursively learn features [15.29093374895364]
We identify and characterize the mechanism through which deep fully connected neural networks learn gradient features.
Our ansatz sheds light on various deep learning phenomena including emergence of spurious features and simplicity biases.
To demonstrate the effectiveness of this feature learning mechanism, we use it to enable feature learning in classical, non-feature learning models.
arXiv Detail & Related papers (2022-12-28T15:50:58Z) - Robust Graph Representation Learning via Predictive Coding [46.22695915912123]
Predictive coding is a message-passing framework initially developed to model information processing in the brain.
In this work, we build models that rely on the message-passing rule of predictive coding.
We show that the proposed models are comparable to standard ones in terms of performance in both inductive and transductive tasks.
arXiv Detail & Related papers (2022-12-09T03:58:22Z) - What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? [0.0]
We study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods.
We show how NTKs allow to generate adversarial examples in a training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the lazy'' regime.
arXiv Detail & Related papers (2022-10-11T16:11:48Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Convolutional Motif Kernel Networks [1.104960878651584]
We show that our model is able to robustly learn on small datasets and reaches state-of-the-art performance on relevant healthcare prediction tasks.
Our proposed method can be utilized on DNA and protein sequences.
arXiv Detail & Related papers (2021-11-03T15:06:09Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.