Related papers: The Mean Dimension of Neural Networks -- What causes the interaction effects?

The Mean Dimension of Neural Networks -- What causes the interaction effects?

URL: http://arxiv.org/abs/2207.04890v1
Date: Mon, 11 Jul 2022 14:00:06 GMT
Title: The Mean Dimension of Neural Networks -- What causes the interaction effects?
Authors: Roman Hahn, Christoph Feinauer, Emanuele Borgonovo
Abstract summary: Owen and Hoyt recently showed that the effective dimension offers key structural information about the input-output mapping underlying an artificial neural network. This work proposes an estimation procedure that allows the calculation of the mean dimension from a given dataset.
Score: 0.9208007322096533
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Owen and Hoyt recently showed that the effective dimension offers key structural information about the input-output mapping underlying an artificial neural network. Along this line of research, this work proposes an estimation procedure that allows the calculation of the mean dimension from a given dataset, without resampling from external distributions. The design yields total indices when features are independent and a variant of total indices when features are correlated. We show that this variant possesses the zero independence property. With synthetic datasets, we analyse how the mean dimension evolves layer by layer and how the activation function impacts the magnitude of interactions. We then use the mean dimension to study some of the most widely employed convolutional architectures for image recognition (LeNet, ResNet, DenseNet). To account for pixel correlations, we propose calculating the mean dimension after the addition of an inverse PCA layer that allows one to work on uncorrelated PCA-transformed features, without the need to retrain the neural network. We use the generalized total indices to produce heatmaps for post-hoc explanations, and we employ the mean dimension on the PCA-transformed features for cross comparisons of the artificial neural networks structures. Results provide several insights on the difference in magnitude of interactions across the architectures, as well as indications on how the mean dimension evolves during training.

Related papers

Networks with Finite VC Dimension: Pro and Contra [1.0128808054306184]
It is shown that, whereas finite VC dimension is desirable for uniform convergence of empirical errors, it may not be desirable for approximation of functions drawn from a probability distribution modeling the likelihood that they occur in a given type of application. Practical limitations of the universal approximation property, the trade-offs between the accuracy of approximation and consistency in learning from data, and the influence of depth of networks with ReLU units on their accuracy and consistency are discussed.
arXiv Detail & Related papers (2025-02-04T19:44:14Z)
Measuring Feature Dependency of Neural Networks by Collapsing Feature Dimensions in the Data Manifold [18.64569268049846]
We introduce a new technique to measure the feature dependency of neural network models. The motivation is to better understand a model by querying whether it is using information from human-understandable features. We test our method on deep neural network models trained on synthetic image data with known ground truth.
arXiv Detail & Related papers (2024-04-18T17:10:18Z)
Task structure and nonlinearity jointly determine learned representational geometry [0.0]
We show that Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
arXiv Detail & Related papers (2024-01-24T16:14:38Z)
Multilayer Multiset Neuronal Networks -- MMNNs [55.2480439325792]
The present work describes multilayer multiset neuronal networks incorporating two or more layers of coincidence similarity neurons. The work also explores the utilization of counter-prototype points, which are assigned to the image regions to be avoided.
arXiv Detail & Related papers (2023-08-28T12:55:13Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Neural Eigenfunctions Are Structured Representation Learners [93.53445940137618]
This paper introduces a structured, adaptive-length deep representation called Neural Eigenmap. We show that, when the eigenfunction is derived from positive relations in a data augmentation setup, applying NeuralEF results in an objective function. We demonstrate using such representations as adaptive-length codes in image retrieval systems.
arXiv Detail & Related papers (2022-10-23T07:17:55Z)
Decomposing neural networks as mappings of correlation functions [57.52754806616669]
We study the mapping between probability distributions implemented by a deep feed-forward network. We identify essential statistics in the data, as well as different information representations that can be used by neural networks.
arXiv Detail & Related papers (2022-02-10T09:30:31Z)
The Role of Linear Layers in Nonlinear Interpolating Networks [13.25706838589123]
Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function. Our results show that adding linear layers to a ReLU network yields a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units.
arXiv Detail & Related papers (2022-02-02T02:33:24Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Data Augmentation Through Monte Carlo Arithmetic Leads to More Generalizable Classification in Connectomics [0.0]
We use Monte Carlo Arithmetic to perturb a structural connectome estimation pipeline. The perturbed networks were captured in an augmented dataset, which was then used for an age classification task. We find that this benefit does not hinge on a large number of perturbations, suggesting that even minimally perturbing a dataset adds meaningful variance which can be captured in the subsequently designed models.
arXiv Detail & Related papers (2021-09-20T16:06:05Z)
Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation [90.28365183660438]
This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation. We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component. Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
arXiv Detail & Related papers (2020-03-17T03:52:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.