Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the
Optimization Landscape Around the True Solution
- URL: http://arxiv.org/abs/2207.07612v1
- Date: Fri, 15 Jul 2022 17:11:26 GMT
- Title: Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the
Optimization Landscape Around the True Solution
- Authors: Jianhao Ma, Salar Fattahi
- Abstract summary: This work characterizes the effect of depth on the optimization landscape of regression.
We show that, despite their non neurality, deeper models have more desirable optimization.
- Score: 4.7464518249313805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work characterizes the effect of depth on the optimization landscape of
linear regression, showing that, despite their nonconvexity, deeper models have
more desirable optimization landscape. We consider a robust and
over-parameterized setting, where a subset of measurements are grossly
corrupted with noise and the true linear model is captured via an $N$-layer
linear neural network. On the negative side, we show that this problem
\textit{does not} have a benign landscape: given any $N\geq 1$, with constant
probability, there exists a solution corresponding to the ground truth that is
neither local nor global minimum. However, on the positive side, we prove that,
for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes
oblivious to such ``problematic'' solutions; instead, it converges to a
balanced solution that is not only close to the ground truth but also enjoys a
flat local landscape, thereby eschewing the need for "early stopping". Lastly,
we empirically verify that the desirable optimization landscape of deeper
models extends to other robust learning tasks, including deep matrix recovery
and deep ReLU networks with $\ell_1$-loss.
Related papers
- Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport [26.47265060394168]
We show that the gradient flow for deep neural networks converges arbitrarily at a distance ofr.
This is done by relying on the theory of gradient distance of finite width in spaces.
arXiv Detail & Related papers (2024-03-19T16:34:31Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Deep Linear Networks can Benignly Overfit when Shallow Ones Do [16.1176305285103]
We show that randomly deep linear networks can closely approximate or even match known bounds for the minimum $ell$-norm interpolant.
Our analysis also reveals that interpolating deep linear models have exactly the same conditional variance as the minimum $ell$-norm solution.
arXiv Detail & Related papers (2022-09-19T19:23:04Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Optimal learning rate schedules in high-dimensional non-convex
optimization problems [14.058580956992051]
Learning rate schedules are ubiquitously used to speed up and improve optimisation.
We present a first analytical study of the role of neural scheduling in this setting.
arXiv Detail & Related papers (2022-02-09T15:15:39Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.