- arxiv url: http://arxiv.org/abs/2101.10588v1
- Date: Tue, 26 Jan 2021 06:46:41 GMT
- Title: Generalization error of random features and kernel methods:
hypercontractivity and kernel matrix concentration
- Title(参考訳): ランダムな特徴とカーネル手法の一般化誤差:超収縮性とカーネルマトリックス濃度
- Authors: Song Mei, Theodor Misiakiewicz, Andrea Montanari
- Abstract要約: 特徴空間 $mathbb RN$ におけるリッジ回帰と併用したランダム特徴量法の利用について検討する。
- 参考スコア(独自算出の注目度): 19.78800773518545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider the classical supervised learning problem: we are given data
$(y_i,{\boldsymbol x}_i)$, $i\le n$, with $y_i$ a response and ${\boldsymbol
x}_i\in {\mathcal X}$ a covariates vector, and try to learn a model
$f:{\mathcal X}\to{\mathbb R}$ to predict future responses. Random features
methods map the covariates vector ${\boldsymbol x}_i$ to a point ${\boldsymbol
\phi}({\boldsymbol x}_i)$ in a higher dimensional space ${\mathbb R}^N$, via a
random featurization map ${\boldsymbol \phi}$. We study the use of random
features methods in conjunction with ridge regression in the feature space
${\mathbb R}^N$. This can be viewed as a finite-dimensional approximation of
kernel ridge regression (KRR), or as a stylized model for neural networks in
the so called lazy training regime.
We define a class of problems satisfying certain spectral conditions on the
underlying kernels, and a hypercontractivity assumption on the associated
eigenfunctions. These conditions are verified by classical high-dimensional
examples. Under these conditions, we prove a sharp characterization of the
error of random features ridge regression. In particular, we address two
fundamental questions: $(1)$~What is the generalization error of KRR? $(2)$~How
big $N$ should be for the random features approximation to achieve the same
error as KRR?
In this setting, we prove that KRR is well approximated by a projection onto
the top $\ell$ eigenfunctions of the kernel, where $\ell$ depends on the sample
size $n$. We show that the test error of random features ridge regression is
dominated by its approximation error and is larger than the error of KRR as
long as $N\le n^{1-\delta}$ for some $\delta>0$. We characterize this gap. For
$N\ge n^{1+\delta}$, random features achieve the same error as the
corresponding KRR, and further increasing $N$ does not lead to a significant
change in test error.
- Abstract(参考訳): y_i,{\boldsymbol x}_i)$, $i\le n$, with $y_i$ a response and ${\boldsymbol x}_i\in {\mathcal x}$ a covariates vector, and try to learn a model $f:{\mathcal x}\to{\mathbb r}$ to predict future response. と題された。
ランダムの特徴は、共変ベクトル ${\boldsymbol x}_i$ を高次元空間 ${\mathbb R}^N$ 上の点 ${\boldsymbol \phi}({\boldsymbol x}_i)$ に写すことである。
本研究では,特徴空間 ${\mathbb r}^n$ におけるリッジ回帰を伴うランダム特徴法の利用について検討する。
この設定では、KRR がカーネルのトップ $\ell$ 固有関数への射影によってよく近似されることを証明し、$\ell$ はサンプルサイズ $n$ に依存する。
ランダムな特徴のリッジ回帰のテスト誤差は近似誤差によって支配され、ある $\delta>0$ に対して $N\le n^{1-\delta}$ であれば KRR の誤差よりも大きいことを示す。
N の n^{1+\delta}$ の場合、ランダムな特徴は対応する KRR と同じ誤差を達成し、さらに$N$ の増加はテストエラーに大きな変化をもたらすことはない。
