Feature Chirality in Deep Learning Models
- URL: http://arxiv.org/abs/2305.03966v1
- Date: Sat, 6 May 2023 07:57:38 GMT
- Title: Feature Chirality in Deep Learning Models
- Authors: Shipeng Ji, Yang Li, Ruizhi Fu, Jiabao Wang, Zhuang Miao
- Abstract summary: We study feature chirality innovatively, which shows how the statistics of deep learning models' feature data are changed by training.
Our work shows that feature chirality implies model evaluation, interpretability of the model, and model parameters optimization.
- Score: 7.402957682300806
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As deep learning applications extensively increase by leaps and bounds, their
interpretability has become increasingly prominent. As a universal property,
chirality exists widely in nature, and applying it to the explanatory research
of deep learning may be helpful to some extent. Inspired by a recent study that
used CNN (convolutional neural network), which applied visual chirality, to
distinguish whether an image is flipped or not. In this paper, we study feature
chirality innovatively, which shows how the statistics of deep learning models'
feature data are changed by training. We rethink the feature-level chirality
property, propose the feature chirality, and give the measure. Our analysis of
feature chirality on AlexNet, VGG, and ResNet reveals similar but surprising
results, including the prevalence of feature chirality in these models, the
initialization methods of the models do not affect feature chirality. Our work
shows that feature chirality implies model evaluation, interpretability of the
model, and model parameters optimization.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Unraveling Feature Extraction Mechanisms in Neural Networks [10.13842157577026]
We propose a theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such mechanisms.
We reveal how these models leverage statistical features during gradient descent and how they are integrated into final decisions.
We find that while self-attention and CNN models may exhibit limitations in learning n-grams, multiplication-based models seem to excel in this area.
arXiv Detail & Related papers (2023-10-25T04:22:40Z) - On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use.
We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z) - Mechanism of feature learning in deep fully connected networks and
kernel machines that recursively learn features [15.29093374895364]
We identify and characterize the mechanism through which deep fully connected neural networks learn gradient features.
Our ansatz sheds light on various deep learning phenomena including emergence of spurious features and simplicity biases.
To demonstrate the effectiveness of this feature learning mechanism, we use it to enable feature learning in classical, non-feature learning models.
arXiv Detail & Related papers (2022-12-28T15:50:58Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network.
We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.