ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
- URL: http://arxiv.org/abs/2305.15598v3
- Date: Wed, 26 Jun 2024 16:29:56 GMT
- Title: ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
- Authors: Suzanna Parkinson, Greg Ongie, Rebecca Willett,
- Abstract summary: This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
- Score: 9.96121040675476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks often operate in the overparameterized regime, in which there are far more parameters than training samples, allowing the training data to be fit perfectly. That is, training the network effectively learns an interpolating function, and properties of the interpolant affect predictions the network will make on new samples. This manuscript explores how properties of such functions learned by neural networks of depth greater than two layers. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding additional linear layers to the input side of a shallow ReLU network yields a representation cost favoring functions with low mixed variation - that is, it has limited variation in directions orthogonal to a low-dimensional subspace and can be well approximated by a single- or multi-index model. Such functions may be represented by the composition of a function with low two-layer representation cost and a low-rank linear operator. Our experiments confirm this behavior in standard network training regimes. They additionally show that linear layers can improve generalization and the learned network is well-aligned with the true latent low-dimensional linear subspace when data is generated using a multi-index model.
Related papers
- Half-Space Feature Learning in Neural Networks [2.3249139042158853]
There currently exist two extreme viewpoints for neural network feature learning.
We argue neither interpretation is likely to be correct based on a novel viewpoint.
We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN)
arXiv Detail & Related papers (2024-04-05T12:03:19Z) - Hidden Classification Layers: Enhancing linear separability between
classes in neural networks layers [0.0]
We investigate the impact on deep network performances of a training approach.
We propose a neural network architecture which induces an error function involving the outputs of all the network layers.
arXiv Detail & Related papers (2023-06-09T10:52:49Z) - Exploring the Approximation Capabilities of Multiplicative Neural
Networks for Smooth Functions [9.936974568429173]
We consider two classes of target functions: generalized bandlimited functions and Sobolev-Type balls.
Our results demonstrate that multiplicative neural networks can approximate these functions with significantly fewer layers and neurons.
These findings suggest that multiplicative gates can outperform standard feed-forward layers and have potential for improving neural network design.
arXiv Detail & Related papers (2023-01-11T17:57:33Z) - The Role of Linear Layers in Nonlinear Interpolating Networks [13.25706838589123]
Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs.
The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function.
Our results show that adding linear layers to a ReLU network yields a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units.
arXiv Detail & Related papers (2022-02-02T02:33:24Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Slope and generalization properties of neural networks [0.0]
We show that the distribution of the slope of a well-trained neural network classifier is generally independent of the width of the layers in a fully connected network.
The slope is of similar size throughout the relevant volume, and varies smoothly. It also behaves as predicted in rescaling examples.
We discuss possible applications of the slope concept, such as using it as a part of the loss function or stopping criterion during network training, or ranking data sets in terms of their complexity.
arXiv Detail & Related papers (2021-07-03T17:54:27Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network
Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks.
We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance.
Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.