The Mean Dimension of Neural Networks -- What causes the interaction
effects?
- URL: http://arxiv.org/abs/2207.04890v1
- Date: Mon, 11 Jul 2022 14:00:06 GMT
- Title: The Mean Dimension of Neural Networks -- What causes the interaction
effects?
- Authors: Roman Hahn, Christoph Feinauer, Emanuele Borgonovo
- Abstract summary: Owen and Hoyt recently showed that the effective dimension offers key structural information about the input-output mapping underlying an artificial neural network.
This work proposes an estimation procedure that allows the calculation of the mean dimension from a given dataset.
- Score: 0.9208007322096533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Owen and Hoyt recently showed that the effective dimension offers key
structural information about the input-output mapping underlying an artificial
neural network. Along this line of research, this work proposes an estimation
procedure that allows the calculation of the mean dimension from a given
dataset, without resampling from external distributions. The design yields
total indices when features are independent and a variant of total indices when
features are correlated. We show that this variant possesses the zero
independence property. With synthetic datasets, we analyse how the mean
dimension evolves layer by layer and how the activation function impacts the
magnitude of interactions. We then use the mean dimension to study some of the
most widely employed convolutional architectures for image recognition (LeNet,
ResNet, DenseNet). To account for pixel correlations, we propose calculating
the mean dimension after the addition of an inverse PCA layer that allows one
to work on uncorrelated PCA-transformed features, without the need to retrain
the neural network. We use the generalized total indices to produce heatmaps
for post-hoc explanations, and we employ the mean dimension on the
PCA-transformed features for cross comparisons of the artificial neural
networks structures. Results provide several insights on the difference in
magnitude of interactions across the architectures, as well as indications on
how the mean dimension evolves during training.
Related papers
- Measuring Feature Dependency of Neural Networks by Collapsing Feature Dimensions in the Data Manifold [18.64569268049846]
We introduce a new technique to measure the feature dependency of neural network models.
The motivation is to better understand a model by querying whether it is using information from human-understandable features.
We test our method on deep neural network models trained on synthetic image data with known ground truth.
arXiv Detail & Related papers (2024-04-18T17:10:18Z) - Task structure and nonlinearity jointly determine learned
representational geometry [0.0]
We show that Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs.
Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
arXiv Detail & Related papers (2024-01-24T16:14:38Z) - Multilayer Multiset Neuronal Networks -- MMNNs [55.2480439325792]
The present work describes multilayer multiset neuronal networks incorporating two or more layers of coincidence similarity neurons.
The work also explores the utilization of counter-prototype points, which are assigned to the image regions to be avoided.
arXiv Detail & Related papers (2023-08-28T12:55:13Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Neural Eigenfunctions Are Structured Representation Learners [93.53445940137618]
This paper introduces a structured, adaptive-length deep representation called Neural Eigenmap.
We show that, when the eigenfunction is derived from positive relations in a data augmentation setup, applying NeuralEF results in an objective function.
We demonstrate using such representations as adaptive-length codes in image retrieval systems.
arXiv Detail & Related papers (2022-10-23T07:17:55Z) - Decomposing neural networks as mappings of correlation functions [57.52754806616669]
We study the mapping between probability distributions implemented by a deep feed-forward network.
We identify essential statistics in the data, as well as different information representations that can be used by neural networks.
arXiv Detail & Related papers (2022-02-10T09:30:31Z) - The Role of Linear Layers in Nonlinear Interpolating Networks [13.25706838589123]
Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs.
The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function.
Our results show that adding linear layers to a ReLU network yields a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units.
arXiv Detail & Related papers (2022-02-02T02:33:24Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Data Augmentation Through Monte Carlo Arithmetic Leads to More
Generalizable Classification in Connectomics [0.0]
We use Monte Carlo Arithmetic to perturb a structural connectome estimation pipeline.
The perturbed networks were captured in an augmented dataset, which was then used for an age classification task.
We find that this benefit does not hinge on a large number of perturbations, suggesting that even minimally perturbing a dataset adds meaningful variance which can be captured in the subsequently designed models.
arXiv Detail & Related papers (2021-09-20T16:06:05Z) - Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.