Linking convolutional kernel size to generalization bias in face
analysis CNNs
- URL: http://arxiv.org/abs/2302.03750v2
- Date: Sun, 3 Dec 2023 13:23:39 GMT
- Title: Linking convolutional kernel size to generalization bias in face
analysis CNNs
- Authors: Hao Liang, Josue Ortega Caro, Vikram Maheshri, Ankit B. Patel, Guha
Balakrishnan
- Abstract summary: We present a causal framework for linking an architectural hyper parameter to out-of-distribution algorithmic bias.
In our experiments, we focused on measuring the causal relationship between convolutional kernel size and face analysis classification bias.
We show that modifying kernel size, even in one layer of a CNN, changes the frequency content of learned features significantly across data subgroups.
- Score: 9.030335233143603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training dataset biases are by far the most scrutinized factors when
explaining algorithmic biases of neural networks. In contrast, hyperparameters
related to the neural network architecture have largely been ignored even
though different network parameterizations are known to induce different
implicit biases over learned features. For example, convolutional kernel size
is known to affect the frequency content of features learned in CNNs. In this
work, we present a causal framework for linking an architectural hyperparameter
to out-of-distribution algorithmic bias. Our framework is experimental, in that
we train several versions of a network with an intervention to a specific
hyperparameter, and measure the resulting causal effect of this choice on
performance bias when a particular out-of-distribution image perturbation is
applied. In our experiments, we focused on measuring the causal relationship
between convolutional kernel size and face analysis classification bias across
different subpopulations (race/gender), with respect to high-frequency image
details. We show that modifying kernel size, even in one layer of a CNN,
changes the frequency content of learned features significantly across data
subgroups leading to biased generalization performance even in the presence of
a balanced dataset.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Do deep neural networks have an inbuilt Occam's razor? [1.1470070927586016]
We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
arXiv Detail & Related papers (2023-04-13T16:58:21Z) - Increasing biases can be more efficient than increasing weights [33.05856234084821]
Unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next.
We show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model's performance.
arXiv Detail & Related papers (2023-01-03T01:36:31Z) - Interpreting Bias in the Neural Networks: A Peek Into Representational
Similarity [0.0]
We investigate the performance and internal representational structure of convolution-based neural networks trained on biased data.
We specifically study similarities in representations, using Centered Kernel Alignment (CKA) for different objective functions.
We note that without progressive representational similarities among the layers of a neural network, the performance is less likely to be robust.
arXiv Detail & Related papers (2022-11-14T22:17:14Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning.
Recent observations in deep neural networks contradict conventional wisdom from classical statistics.
We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.