Revealing the Structure of Deep Neural Networks via Convex Duality
- URL: http://arxiv.org/abs/2002.09773v4
- Date: Fri, 11 Jun 2021 17:21:01 GMT
- Title: Revealing the Structure of Deep Neural Networks via Convex Duality
- Authors: Tolga Ergen, Mert Pilanci
- Abstract summary: We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
- Score: 70.15611146583068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study regularized deep neural networks (DNNs) and introduce a convex
analytic framework to characterize the structure of the hidden layers. We show
that a set of optimal hidden layer weights for a norm regularized DNN training
problem can be explicitly found as the extreme points of a convex set. For the
special case of deep linear networks, we prove that each optimal weight matrix
aligns with the previous layers via duality. More importantly, we apply the
same characterization to deep ReLU networks with whitened data and prove the
same weight alignment holds. As a corollary, we also prove that norm
regularized deep ReLU networks yield spline interpolation for one-dimensional
datasets which was previously known only for two-layer networks. Furthermore,
we provide closed-form solutions for the optimal layer weights when data is
rank-one or whitened. The same analysis also applies to architectures with
batch normalization even for arbitrary data. Therefore, we obtain a complete
explanation for a recent empirical observation termed Neural Collapse where
class means collapse to the vertices of a simplex equiangular tight frame.
Related papers
- Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.
This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z) - From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity [54.01594785269913]
We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss.
The training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset.
arXiv Detail & Related papers (2023-09-28T15:19:30Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Deep Networks Provably Classify Data on Curves [12.309532551321334]
We study a model problem that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere.
We prove that when (i) the network depth is large to certain properties that set the difficulty of the problem and (ii) the network width and number of samples is intrinsic in the relative depth, randomly-d gradient descent quickly learns to correctly classify all points on the two curves with high probability.
arXiv Detail & Related papers (2021-07-29T20:40:04Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.