Fugu-MT 論文翻訳(概要): Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

論文の概要: Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

arxiv url: http://arxiv.org/abs/2402.05928v1
Date: Thu, 8 Feb 2024 18:57:42 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-09 13:26:24.093891
Title: Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss
Title（参考訳）: 依存学習理論におけるシャープレート:正方形損失に対するサンプルサイズデフレを回避する
Authors: Ingvar Ziemann, Stephen Tu, George J. Pappas, Nikolai Matni
Abstract要約: L2$ と $Psi_p$ の位相が我々の仮説クラス $mathscrF$, $mathscrF$ に同値であるときにいつでも、$mathscrF$ は弱準ガウス類であることを示す。以上の結果から, 混合への直接的な依存は高次項に還元されるため, この問題は実現可能か否かを判断できる。
参考スコア（独自算出の注目度）: 36.252641692809924
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we study statistical learning with dependent ($\beta$-mixing) data and square loss in a hypothesis class $\mathscr{F}\subset L_{\Psi_p}$ where $\Psi_p$ is the norm $\|f\|_{\Psi_p} \triangleq \sup_{m\geq 1} m^{-1/p} \|f\|_{L^m} $ for some $p\in [2,\infty]$. Our inquiry is motivated by the search for a sharp noise interaction term, or variance proxy, in learning with dependent data. Absent any realizability assumption, typical non-asymptotic results exhibit variance proxies that are deflated \emph{multiplicatively} by the mixing time of the underlying covariates process. We show that whenever the topologies of $L^2$ and $\Psi_p$ are comparable on our hypothesis class $\mathscr{F}$ -- that is, $\mathscr{F}$ is a weakly sub-Gaussian class: $\|f\|_{\Psi_p} \lesssim \|f\|_{L^2}^\eta$ for some $\eta\in (0,1]$ -- the empirical risk minimizer achieves a rate that only depends on the complexity of the class and second order statistics in its leading term. Our result holds whether the problem is realizable or not and we refer to this as a \emph{near mixing-free rate}, since direct dependence on mixing is relegated to an additive higher order term. We arrive at our result by combining the above notion of a weakly sub-Gaussian class with mixed tail generic chaining. This combination allows us to compute sharp, instance-optimal rates for a wide range of problems. %Our approach, reliant on mixed tail generic chaining, allows us to obtain sharp, instance-optimal rates. Examples that satisfy our framework include sub-Gaussian linear regression, more general smoothly parameterized function classes, finite hypothesis classes, and bounded smoothness classes.
Abstract（参考訳）: 本研究では,従属データ (\beta$-mixing) と二乗損失 (square loss) を用いた統計的学習について,$\mathscr{f}\subset l_{\psi_p}$ ここで$\psi_p$はノルム$\|f\|_{\psi_p} \triangleq \sup_{m\geq 1} m^{-1/p} \|f\|_{l^m} $ for some $p\in [2,\infty]$である。我々の調査は、依存データを用いた学習において、鋭いノイズ相互作用項(distribution proxy)の探索に動機づけられている。任意の実現可能性の仮定を欠いて、典型的な非漸近的な結果は、下層の共変量過程の混合時間によってデフレーションされる分散プロキシを示す。 L^2$ と $\Psi_p$ の位相が我々の仮説類 $\mathscr{F}$ -- つまり、$\mathscr{F}$ は弱準ガウス類であることを示す: $\|f\|_{\Psi_p} \lesssim \|f\|_{L^2}^\eta$ for some $\eta\in (0,1]$ -- 経験的リスク最小化は、その先行項におけるクラスと二階統計の複雑さにのみ依存する率を達成する。この結果から,問題は実現可能かどうかを判断し,混合に対する直接依存は加法的な高次項に委譲されるため,これを<emph{near mixed-free rate}>と呼ぶ。我々は上記の弱準ガウス類の概念と混合テール一般連鎖を組み合わせることで結果に到達する。この組み合わせにより、幅広い問題に対して、鋭いインスタンス最適化レートを計算できます。 %のアプローチは、混合テールジェネリックチェインに依存しており、鋭いインスタンス最適化率を得ることができる。我々のフレームワークを満たす例としては、準ガウス線型回帰、より一般的なスムーズなパラメータ化関数クラス、有限仮説クラス、有界滑らか性クラスがある。

論文の概要: Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

関連論文リスト