Related papers: Provable Training of a ReLU Gate with an Iterative Non-Gradient Algorithm

Provable Training of a ReLU Gate with an Iterative Non-Gradient Algorithm

URL: http://arxiv.org/abs/2005.04211v5
Date: Fri, 1 Apr 2022 15:56:40 GMT
Title: Provable Training of a ReLU Gate with an Iterative Non-Gradient Algorithm
Authors: Sayar Karmakar and Anirbit Mukherjee
Abstract summary: We show provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes. We show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels. Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude.
Score: 0.7614628596146599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we demonstrate provable guarantees on the training of a single ReLU gate in hitherto unexplored regimes. We give a simple iterative stochastic algorithm that can train a ReLU gate in the realizable setting in linear time while using significantly milder conditions on the data distribution than previous such results. Leveraging certain additional moment assumptions, we also show a first-of-its-kind approximate recovery of the true label generating parameters under an (online) data-poisoning attack on the true labels, while training a ReLU gate by the same algorithm. Our guarantee is shown to be nearly optimal in the worst case and its accuracy of recovering the true weight degrades gracefully with increasing probability of attack and its magnitude. For both the realizable and the non-realizable cases as outlined above, our analysis allows for mini-batching and computes how the convergence time scales with the mini-batch size. We corroborate our theorems with simulation results which also bring to light a striking similarity in trajectories between our algorithm and the popular S.G.D. algorithm - for which similar guarantees as here are still unknown.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Robust Regression Revisited: Acceleration and Improved Estimation Rates [25.54653340884806]
We study fast algorithms for statistical regression problems under the strong contamination model. The goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees.
arXiv Detail & Related papers (2021-06-22T17:21:56Z)
Sparse Bayesian Learning via Stepwise Regression [1.2691047660244335]
We propose a coordinate ascent algorithm for SBL termed Relevance Matching Pursuit (RMP) As its noise variance parameter goes to zero, RMP exhibits a surprising connection to Stepwise Regression. We derive novel guarantees for Stepwise Regression algorithms, which also shed light on RMP.
arXiv Detail & Related papers (2021-06-11T00:20:27Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient) We pay particular attention to the time and memory overhead of our proposed method. We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z)
Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z)
Investigating the Scalability and Biological Plausibility of the Activation Relaxation Algorithm [62.997667081978825]
Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm. We show that the algorithm can be further simplified and made more biologically plausible by introducing a learnable set of backwards weights. We also investigate whether another biologically implausible assumption of the original AR algorithm -- the frozen feedforward pass -- can be relaxed without damaging performance.
arXiv Detail & Related papers (2020-10-13T08:02:38Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Fast OSCAR and OWL Regression via Safe Screening Rules [97.28167655721766]
Ordered $L_1$ (OWL) regularized regression is a new regression analysis for high-dimensional sparse learning. Proximal gradient methods are used as standard approaches to solve OWL regression. We propose the first safe screening rule for OWL regression by exploring the order of the primal solution with the unknown order structure.
arXiv Detail & Related papers (2020-06-29T23:35:53Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.