Related papers: Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN

Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN

URL: http://arxiv.org/abs/2502.08625v1
Date: Wed, 12 Feb 2025 18:25:13 GMT
Title: Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN
Authors: Junpeng Zhang, Lei Cheng, Qing Li, Liang Lin, Quanshi Zhang,
Abstract summary: We find that the complexity of interactions encoded by a deep neural network (DNN) can explain its generalization power.<n>We also discover that the confusing samples of a DNN, which are represented by non-generalizable interactions, are determined by its low-layer parameters.
Score: 67.80700786901016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we find that the complexity of interactions encoded by a deep neural network (DNN) can explain its generalization power. We also discover that the confusing samples of a DNN, which are represented by non-generalizable interactions, are determined by its low-layer parameters. In comparison, other factors, such as high-layer parameters and network architecture, have much less impact on the composition of confusing samples. Two DNNs with different low-layer parameters usually have fully different sets of confusing samples, even though they have similar performance. This finding extends the understanding of the lottery ticket hypothesis, and well explains distinctive representation power of different DNNs.

Related papers

Learning local discrete features in explainable-by-design convolutional neural networks [0.0]
We introduce an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism. The model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections. By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels.
arXiv Detail & Related papers (2024-10-31T18:39:41Z)
Information-Theoretic Generalization Bounds for Deep Neural Networks [20.015357820733406]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.<n>This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z)
How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers [37.54951110709193]
We show that a flat' prior over the NN parameterization induces a rich prior over the NN functions. This creates a bias towards simpler functions, which require less relevant parameters to represent.
arXiv Detail & Related papers (2024-02-09T11:03:52Z)
Interpretable Neural Networks with Random Constructive Algorithm [3.1200894334384954]
This paper introduces an Interpretable Neural Network (INN) incorporating spatial information to tackle the opaque parameterization process of random weighted neural networks. It devises a geometric relationship strategy using a pool of candidate nodes and established relationships to select node parameters conducive to network convergence.
arXiv Detail & Related papers (2023-07-01T01:07:20Z)
Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression [37.69303106863453]
The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs) In this paper, we introduce a framework for conducting IB analysis of general NNs. We also perform IB analysis on a close-to-real-scale, which reveals new features of the MI dynamics.
arXiv Detail & Related papers (2023-05-13T21:44:32Z)
Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models [21.02716340199201]
This study aims to prove the emergence of symbolic concepts (or more precisely, sparse primitive inference patterns) in well-trained deep neural networks (DNNs) We show that inference scores of the DNN on an exponentially large number of randomly masked samples can always be well mimicked by numerical effects of just a few interactions.
arXiv Detail & Related papers (2023-05-03T07:32:28Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Discovering and Explaining the Representation Bottleneck of DNNs [21.121270460158712]
This paper explores the bottleneck of feature representations of deep neural networks (DNNs) We focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple interactions and too complex interactions, but usually fails to learn interactions of intermediate complexity.
arXiv Detail & Related papers (2021-11-11T14:35:20Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors. We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
GraN: An Efficient Gradient-Norm Based Detector for Adversarial and Misclassified Examples [77.99182201815763]
Deep neural networks (DNNs) are vulnerable to adversarial examples and other data perturbations. GraN is a time- and parameter-efficient method that is easily adaptable to any DNN. GraN achieves state-of-the-art performance on numerous problem set-ups.
arXiv Detail & Related papers (2020-04-20T10:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.