Related papers: A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

URL: http://arxiv.org/abs/2210.00176v1
Date: Sat, 1 Oct 2022 03:09:02 GMT
Title: A Combinatorial Perspective on the Optimization of Shallow ReLU Networks
Authors: Michael Matena, Colin Raffel
Abstract summary: NP-hard problem of optimizing a shallow ReLU network can be characterized as a search over each training example's activation pattern. We show that it can be naturally modeled via a zonotope with its set is to the set of feasible activation patterns.
Score: 35.329916555349214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric object known as a zonotope with its vertex set isomorphic to the set of feasible activation patterns. This assists in analysis and provides a foundation for further research. We demonstrate its usefulness when we explore the sensitivity of the optimal loss to perturbations of the training data. Later we discuss methods of zonotope vertex selection and its relevance to optimization. Overparameterization assists in training by making a randomly chosen vertex more likely to contain a good solution. We then introduce a novel polynomial-time vertex selection procedure that provably picks a vertex containing the global optimum using only double the minimum number of parameters required to fit the data. We further introduce a local greedy search heuristic over zonotope vertices and demonstrate that it outperforms gradient descent on underparameterized problems.

Related papers

Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity. A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z)
Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function. We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z)
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives. We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem. We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z)
Evolutionary Algorithm and Multifactorial Evolutionary Algorithm on Clustered Shortest-Path Tree problem [2.578242050187029]
Clustered Shortest-Path Tree Problem (CluSPT) is an NP-hard problem. To enhance the performance of the search process, two approaches are proposed.
arXiv Detail & Related papers (2020-10-19T08:37:18Z)
Adaptive Sampling of Pareto Frontiers with Binary Constraints Using Regression and Classification [0.0]
We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints. Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals. We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation.
arXiv Detail & Related papers (2020-08-27T09:15:02Z)
The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.