Related papers: Global Convergence of Frank Wolfe on One Hidden Layer Networks

Global Convergence of Frank Wolfe on One Hidden Layer Networks

URL: http://arxiv.org/abs/2002.02208v1
Date: Thu, 6 Feb 2020 11:58:43 GMT
Title: Global Convergence of Frank Wolfe on One Hidden Layer Networks
Authors: Alexandre d'Aspremont, Mert Pilanci
Abstract summary: We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program.
Score: 121.96696298666014
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then converges with rate $O(1/T)$ where $T$ is both the number of neurons and the number of calls to the oracle.

Related papers

On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks [13.2844023993979]
Federated learning (FL) is a widely employed distributed paradigm for collaboratively machine learning models from multiple clients without sharing local data. In this paper, we show that FedAvg converges to a global minimum at a global rate at a global focus.
arXiv Detail & Related papers (2023-10-09T07:56:56Z)
Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution [22.016345507132808]
Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface is generally non-smooth or even bounded. We propose an algorithmic framework allowing a minimization problem allowing convergence to an $varepsilon$-(global) minimum.
arXiv Detail & Related papers (2022-02-07T21:23:16Z)
Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond [19.544213396776268]
We introduce regularized Frank-Wolfe, a general and effective CNN baseline inference for dense conditional fields. We show that our new algorithms, with our new algorithms, with our new datasets, with significant improvements in strong strong neural networks.
arXiv Detail & Related papers (2021-10-27T20:44:47Z)
Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations. In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z)
Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps [66.88729048402082]
Generalized self-concordance is a key property present in the objective function of many learning problems. We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
arXiv Detail & Related papers (2021-05-28T15:26:36Z)
Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation. At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z)
A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.