A Combinatorial Perspective on the Optimization of Shallow ReLU Networks
- URL: http://arxiv.org/abs/2210.00176v1
- Date: Sat, 1 Oct 2022 03:09:02 GMT
- Title: A Combinatorial Perspective on the Optimization of Shallow ReLU Networks
- Authors: Michael Matena, Colin Raffel
- Abstract summary: NP-hard problem of optimizing a shallow ReLU network can be characterized as a search over each training example's activation pattern.
We show that it can be naturally modeled via a zonotope with its set is to the set of feasible activation patterns.
- Score: 35.329916555349214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The NP-hard problem of optimizing a shallow ReLU network can be characterized
as a combinatorial search over each training example's activation pattern
followed by a constrained convex problem given a fixed set of activation
patterns. We explore the implications of this combinatorial aspect of ReLU
optimization in this work. We show that it can be naturally modeled via a
geometric and combinatoric object known as a zonotope with its vertex set
isomorphic to the set of feasible activation patterns. This assists in analysis
and provides a foundation for further research. We demonstrate its usefulness
when we explore the sensitivity of the optimal loss to perturbations of the
training data. Later we discuss methods of zonotope vertex selection and its
relevance to optimization. Overparameterization assists in training by making a
randomly chosen vertex more likely to contain a good solution. We then
introduce a novel polynomial-time vertex selection procedure that provably
picks a vertex containing the global optimum using only double the minimum
number of parameters required to fit the data. We further introduce a local
greedy search heuristic over zonotope vertices and demonstrate that it
outperforms gradient descent on underparameterized problems.
Related papers
- Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and
Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity.
A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances.
The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained
Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives.
We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem.
We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z) - Evolutionary Algorithm and Multifactorial Evolutionary Algorithm on
Clustered Shortest-Path Tree problem [2.578242050187029]
Clustered Shortest-Path Tree Problem (CluSPT) is an NP-hard problem.
To enhance the performance of the search process, two approaches are proposed.
arXiv Detail & Related papers (2020-10-19T08:37:18Z) - Adaptive Sampling of Pareto Frontiers with Binary Constraints Using
Regression and Classification [0.0]
We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints.
Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals.
We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation.
arXiv Detail & Related papers (2020-08-27T09:15:02Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.