Global Convergence of Frank Wolfe on One Hidden Layer Networks
- URL: http://arxiv.org/abs/2002.02208v1
- Date: Thu, 6 Feb 2020 11:58:43 GMT
- Title: Global Convergence of Frank Wolfe on One Hidden Layer Networks
- Authors: Alexandre d'Aspremont, Mert Pilanci
- Abstract summary: We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks.
When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program.
- Score: 121.96696298666014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We derive global convergence bounds for the Frank Wolfe algorithm when
training one hidden layer neural networks. When using the ReLU activation
function, and under tractable preconditioning assumptions on the sample data
set, the linear minimization oracle used to incrementally form the solution can
be solved explicitly as a second order cone program. The classical Frank Wolfe
algorithm then converges with rate $O(1/T)$ where $T$ is both the number of
neurons and the number of calls to the oracle.
Related papers
- On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks [13.2844023993979]
Federated learning (FL) is a widely employed distributed paradigm for collaboratively machine learning models from multiple clients without sharing local data.
In this paper, we show that FedAvg converges to a global minimum at a global rate at a global focus.
arXiv Detail & Related papers (2023-10-09T07:56:56Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Finite-Sum Optimization: A New Perspective for Convergence to a Global
Solution [22.016345507132808]
Deep neural networks (DNNs) have shown great success in many machine learning tasks.
Their training is challenging since the loss surface is generally non-smooth or even bounded.
We propose an algorithmic framework allowing a minimization problem allowing convergence to an $varepsilon$-(global) minimum.
arXiv Detail & Related papers (2022-02-07T21:23:16Z) - Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and
Beyond [19.544213396776268]
We introduce regularized Frank-Wolfe, a general and effective CNN baseline inference for dense conditional fields.
We show that our new algorithms, with our new algorithms, with our new datasets, with significant improvements in strong strong neural networks.
arXiv Detail & Related papers (2021-10-27T20:44:47Z) - Efficient Algorithms for Learning Depth-2 Neural Networks with General
ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations.
In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z) - Scalable Frank-Wolfe on Generalized Self-concordant Functions via Simple Steps [66.88729048402082]
Generalized self-concordance is a key property present in the objective function of many learning problems.
We show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
arXiv Detail & Related papers (2021-05-28T15:26:36Z) - Neural Thompson Sampling [94.82847209157494]
We propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation.
At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.
arXiv Detail & Related papers (2020-10-02T07:44:09Z) - A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set.
We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.