Initial Guessing Bias: How Untrained Networks Favor Some Classes
- URL: http://arxiv.org/abs/2306.00809v6
- Date: Fri, 08 Nov 2024 14:27:36 GMT
- Title: Initial Guessing Bias: How Untrained Networks Favor Some Classes
- Authors: Emanuele Francazi, Aurelien Lucchi, Marco Baity-Jesi,
- Abstract summary: We show that the structure of a deep neural network (DNN) can condition the model to assign all predictions to the same class, even before the beginning of training.
We prove that, besides dataset properties, the presence of this phenomenon is influenced by model choices including dataset preprocessing methods.
We highlight theoretical consequences, such as the breakdown of node-permutation symmetry and the violation of self-averaging.
- Score: 0.09103230894909536
- License:
- Abstract: Understanding and controlling biasing effects in neural networks is crucial for ensuring accurate and fair model performance. In the context of classification problems, we provide a theoretical analysis demonstrating that the structure of a deep neural network (DNN) can condition the model to assign all predictions to the same class, even before the beginning of training, and in the absence of explicit biases. We prove that, besides dataset properties, the presence of this phenomenon, which we call \textit{Initial Guessing Bias} (IGB), is influenced by model choices including dataset preprocessing methods, and architectural decisions, such as activation functions, max-pooling layers, and network depth. Our analysis of IGB provides information for architecture selection and model initialization. We also highlight theoretical consequences, such as the breakdown of node-permutation symmetry, the violation of self-averaging and the non-trivial effects that depth has on the phenomenon.
Related papers
- Generative Flow Networks: Theory and Applications to Structure Learning [7.6872614776094]
This thesis studies the problem of structure learning from a Bayesian perspective.
It introduces Generative Flow Networks (GFlowNets)
GFlowNets treat generation as a sequential decision making problem.
arXiv Detail & Related papers (2025-01-09T17:47:17Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Analysis of Convolutions, Non-linearity and Depth in Graph Neural
Networks using Neural Tangent Kernel [8.824340350342512]
Graph Neural Networks (GNNs) are designed to exploit the structural information of the data by aggregating the neighboring nodes.
We theoretically analyze the influence of different aspects of the GNN architecture using the Graph Neural Kernel in a semi-supervised node classification setting.
We prove that: (i) linear networks capture the class information as good as ReLU networks; (ii) row normalization preserves the underlying class structure better than other convolutions; (iii) performance degrades with network depth due to over-smoothing; (iv) skip connections retain the class information even at infinite depth, thereby eliminating over-smooth
arXiv Detail & Related papers (2022-10-18T12:28:37Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Critical Initialization of Wide and Deep Neural Networks through Partial
Jacobians: General Theory and Applications [6.579523168465526]
We introduce emphpartial Jacobians of a network, defined as derivatives of preactivations in layer $l$ with respect to preactivations in layer $l_0leq l$.
We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.
arXiv Detail & Related papers (2021-11-23T20:31:42Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - An analytic theory of shallow networks dynamics for hinge loss
classification [14.323962459195771]
We study the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task.
We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss.
This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting.
arXiv Detail & Related papers (2020-06-19T16:25:29Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.