Sharp Analysis of Random Fourier Features in Classification
- URL: http://arxiv.org/abs/2109.10623v1
- Date: Wed, 22 Sep 2021 09:49:27 GMT
- Title: Sharp Analysis of Random Fourier Features in Classification
- Authors: Zhu Li
- Abstract summary: We show for the first time that random Fourier features classification can achieve $O(sqrtn)$ learning rate with only $Omega(sqrtn log n)$ features.
- Score: 9.383533125404755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the theoretical properties of random Fourier features classification
with Lipschitz continuous loss functions such as support vector machine and
logistic regression. Utilizing the regularity condition, we show for the first
time that random Fourier features classification can achieve $O(1/\sqrt{n})$
learning rate with only $\Omega(\sqrt{n} \log n)$ features, as opposed to
$\Omega(n)$ features suggested by previous results. Our study covers the
standard feature sampling method for which we reduce the number of features
required, as well as a problem-dependent sampling method which further reduces
the number of features while still keeping the optimal generalization property.
Moreover, we prove that the random Fourier features classification can obtain a
fast $O(1/n)$ learning rate for both sampling schemes under Massart's low noise
assumption. Our results demonstrate the potential effectiveness of random
Fourier features approximation in reducing the computational complexity
(roughly from $O(n^3)$ in time and $O(n^2)$ in space to $O(n^2)$ and
$O(n\sqrt{n})$ respectively) without having to trade-off the statistical
prediction accuracy. In addition, the achieved trade-off in our analysis is at
least the same as the optimal results in the literature under the worst case
scenario and significantly improves the optimal results under benign regularity
conditions.
Related papers
- Model-adapted Fourier sampling for generative compressed sensing [7.130302992490975]
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix.
We construct a model-adapted sampling strategy with an improved sample complexity of $textitO(kd| boldsymbolalpha|_22)$ measurements.
arXiv Detail & Related papers (2023-10-08T03:13:16Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Average case analysis of Lasso under ultra-sparse conditions [4.568911586155097]
We analyze the performance of the least absolute shrinkage and selection operator (Lasso) for the linear model when the number of regressors grows larger.
The obtained bound for perfect support recovery is a generalization of that given in previous literature.
arXiv Detail & Related papers (2023-02-25T14:50:32Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Stochastic Bias-Reduced Gradient Methods [44.35885731095432]
We develop a new primitive for optimization: a low-bias, low-cost smoothing of ther $x_star$ of any bound of the Moreau-Yoshida function.
arXiv Detail & Related papers (2021-06-17T13:33:05Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Wind Field Reconstruction with Adaptive Random Fourier Features [0.0]
We investigate the use of spatial methods for reconstructing the horizontal near-surface wind field given a sparse set of measurements.
We include a physically motivated divergence penalty term $|nabla cdot beta(pmb x)|2$, as well as a penalty on the Sobolev norm.
We devise an adaptive-Hastings algorithm for sampling the frequencies of the optimal distribution.
arXiv Detail & Related papers (2021-02-04T01:42:08Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.