Related papers: Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

URL: http://arxiv.org/abs/2411.14349v2
Date: Fri, 22 Nov 2024 05:18:23 GMT
Title: Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals
Authors: Anxin Guo, Aravindan Vijayaraghavan,
Abstract summary: We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Our main result is a time statistical query (SQ) algorithm that gives the first constant factor approximation for arbitrary bias. Our algorithm presents an interesting departure from existing algorithms, which are all based on gradient descent.
Score: 10.850101961203752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic building block of modern neural networks, we still do not understand the basic algorithmic question of whether one arbitrary ReLU neuron is learnable in the non-realizable setting. In particular, all existing polynomial time algorithms only provide approximation guarantees for the better-behaved unbiased setting or restricted bias setting. Our main result is a polynomial time statistical query (SQ) algorithm that gives the first constant factor approximation for arbitrary bias. It outputs a ReLU activation that achieves a loss of $O(\mathrm{OPT}) + \varepsilon$ in time $\mathrm{poly}(d,1/\varepsilon)$, where $\mathrm{OPT}$ is the loss obtained by the optimal ReLU activation. Our algorithm presents an interesting departure from existing algorithms, which are all based on gradient descent and thus fall within the class of correlational statistical query (CSQ) algorithms. We complement our algorithmic result by showing that no polynomial time CSQ algorithm can achieve a constant factor approximation. Together, these results shed light on the intrinsic limitation of gradient descent, while identifying arguably the simplest setting (a single neuron) where there is a separation between SQ and CSQ algorithms.

Related papers

Learning Intersections of Two Margin Halfspaces under Factorizable Distributions [56.51474048985742]
Learning intersections of halfspaces is a central problem in Computational Learning Theory.<n>Even for just two halfspaces, it remains a major open question whether learning is possible in time.<n>We introduce a novel algorithm that provably circumvents the CSQ hardness barrier.
arXiv Detail & Related papers (2025-11-13T00:28:24Z)
A Polynomial-time Algorithm for Online Sparse Linear Regression with Improved Regret Bound under Weaker Conditions [75.69959433669244]
We study the problem of online sparse linear regression (OSLR) where the algorithms are restricted to accessing only $k$ out of $d$ per instance for prediction.<n>We introduce a new extend-time algorithm, which significantly improves previous regret bounds.
arXiv Detail & Related papers (2025-10-31T05:02:33Z)
When can you trust feature selection? -- I: A condition-based analysis of LASSO and generalised hardness of approximation [49.1574468325115]
We show how no (randomised) algorithm can determine the correct support sets (with probability $> 1/2$) of minimisers of LASSO when reading approximate input. For ill-posed inputs, the algorithm runs forever, hence, it will never produce a wrong answer. For any algorithm defined on an open set containing a point with infinite condition number, there is an input for which the algorithm will either run forever or produce a wrong answer.
arXiv Detail & Related papers (2023-12-18T18:29:01Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth. We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z)
On the Cryptographic Hardness of Learning Single Periodic Neurons [42.86685497609574]
We show a simple reduction which demonstrates the cryptographic hardness of learning a single neuron over isotropic Gaussian distributions in the presence of noise. Our proposed algorithm is not a gradient-based or an adversarial SQ-time algorithm, but is rather based on the celebrated Lenstra-LenstraLov'asz (LLL) lattice basis reduction algorithm.
arXiv Detail & Related papers (2021-06-20T20:03:52Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Statistical-Query Lower Bounds via Functional Gradients [19.5924910463796]
We show that any statistical-query algorithm with tolerance $n- (1/epsilon)b$ must use at least $2nc epsilon$ queries for some constant $b. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems.
arXiv Detail & Related papers (2020-06-29T05:15:32Z)
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals. For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.