Provable Training of a ReLU Gate with an Iterative Non-Gradient
Algorithm
- URL: http://arxiv.org/abs/2005.04211v5
- Date: Fri, 1 Apr 2022 15:56:40 GMT
- Title: Provable Training of a ReLU Gate with an Iterative Non-Gradient
Algorithm
- Authors: Sayar Karmakar and Anirbit Mukherjee
- Abstract summary: We show provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes.
We show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels.
Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we demonstrate provable guarantees on the training of a single
ReLU gate in hitherto unexplored regimes. We give a simple iterative stochastic
algorithm that can train a ReLU gate in the realizable setting in linear time
while using significantly milder conditions on the data distribution than
previous such results.
Leveraging certain additional moment assumptions, we also show a
first-of-its-kind approximate recovery of the true label generating parameters
under an (online) data-poisoning attack on the true labels, while training a
ReLU gate by the same algorithm. Our guarantee is shown to be nearly optimal in
the worst case and its accuracy of recovering the true weight degrades
gracefully with increasing probability of attack and its magnitude.
For both the realizable and the non-realizable cases as outlined above, our
analysis allows for mini-batching and computes how the convergence time scales
with the mini-batch size. We corroborate our theorems with simulation results
which also bring to light a striking similarity in trajectories between our
algorithm and the popular S.G.D. algorithm - for which similar guarantees as
here are still unknown.
Related papers
- Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Robust Regression Revisited: Acceleration and Improved Estimation Rates [25.54653340884806]
We study fast algorithms for statistical regression problems under the strong contamination model.
The goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples.
We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees.
arXiv Detail & Related papers (2021-06-22T17:21:56Z) - Sparse Bayesian Learning via Stepwise Regression [1.2691047660244335]
We propose a coordinate ascent algorithm for SBL termed Relevance Matching Pursuit (RMP)
As its noise variance parameter goes to zero, RMP exhibits a surprising connection to Stepwise Regression.
We derive novel guarantees for Stepwise Regression algorithms, which also shed light on RMP.
arXiv Detail & Related papers (2021-06-11T00:20:27Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient)
We pay particular attention to the time and memory overhead of our proposed method.
We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - Investigating the Scalability and Biological Plausibility of the
Activation Relaxation Algorithm [62.997667081978825]
Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm.
We show that the algorithm can be further simplified and made more biologically plausible by introducing a learnable set of backwards weights.
We also investigate whether another biologically implausible assumption of the original AR algorithm -- the frozen feedforward pass -- can be relaxed without damaging performance.
arXiv Detail & Related papers (2020-10-13T08:02:38Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning.
Proximal gradient methods are used as standard approaches to solve OWL regression.
We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.