Noise misleads rotation invariant algorithms on sparse targets
- URL: http://arxiv.org/abs/2403.02697v1
- Date: Tue, 5 Mar 2024 06:25:19 GMT
- Title: Noise misleads rotation invariant algorithms on sparse targets
- Authors: Manfred K. Warmuth (1), Wojciech Kot{\l}owski (2), Matt Jones (3),
Ehsan Amid (1) ((1) Google Inc., (2) Institute of Computing Science, Poznan
University of Technology, Poznan, Poland, (3) University of Colorado Boulder,
Colorado, USA)
- Abstract summary: We show that the class of rotation invariant algorithms are suboptimal for learning sparse linear problems.
We prove this via a lower bound for the Bayes optimal algorithm on a rotationally symmetrized problem.
We then prove much lower upper bounds on the same problem for simple non-rotation invariant algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is well known that the class of rotation invariant algorithms are
suboptimal even for learning sparse linear problems when the number of examples
is below the "dimension" of the problem. This class includes any gradient
descent trained neural net with a fully-connected input layer (initialized with
a rotationally symmetric distribution). The simplest sparse problem is learning
a single feature out of $d$ features. In that case the classification error or
regression loss grows with $1-k/n$ where $k$ is the number of examples seen.
These lower bounds become vacuous when the number of examples $k$ reaches the
dimension $d$.
We show that when noise is added to this sparse linear problem, rotation
invariant algorithms are still suboptimal after seeing $d$ or more examples. We
prove this via a lower bound for the Bayes optimal algorithm on a rotationally
symmetrized problem. We then prove much lower upper bounds on the same problem
for simple non-rotation invariant algorithms. Finally we analyze the gradient
flow trajectories of many standard optimization algorithms in some simple cases
and show how they veer toward or away from the sparse targets.
We believe that our trajectory categorization will be useful in designing
algorithms that can exploit sparse targets and our method for proving lower
bounds will be crucial for analyzing other families of algorithms that admit
different classes of invariances.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Robust estimation via generalized quasi-gradients [28.292300073453877]
We show why many recently proposed robust estimation problems are efficiently solvable.
We identify the existence of "generalized quasi-gradients"
We show that generalized quasi-gradients exist and construct efficient algorithms.
arXiv Detail & Related papers (2020-05-28T15:14:33Z) - Efficient Algorithms for Multidimensional Segmented Regression [42.046881924063044]
We study the fundamental problem of fixed design em multidimensional regression.
We provide the first sample and computationally efficient algorithm for this problem in any fixed dimension.
Our algorithm relies on a simple merging iterative approach, which is novel in the multidimensional setting.
arXiv Detail & Related papers (2020-03-24T19:39:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.