Fugu-MT 論文翻訳(概要): Self-training Converts Weak Learners to Strong Learners in Mixture Models

論文の概要: Self-training Converts Weak Learners to Strong Learners in Mixture Models

arxiv url: http://arxiv.org/abs/2106.13805v1
Date: Fri, 25 Jun 2021 17:59:16 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-28 13:11:38.957116
Title: Self-training Converts Weak Learners to Strong Learners in Mixture Models
Title（参考訳）: 自己学習型学習者は混合モデルで強い学習者へ変換する
Authors: Spencer Frei and Difan Zou and Zixiang Chen and Quanquan Gu
Abstract要約: 擬似ラベルの $boldsymbolbeta_mathrmpl$ が,最大$C_mathrmerr$ の分類誤差を達成可能であることを示す。さらに、ロジスティックな損失に対して勾配降下を実行することで、ラベル付き例のみを使用して、分類誤差が$C_mathrmerr$で擬ラベルの $boldsymbolbeta_mathrmpl$ が得られることを示す。
参考スコア（独自算出の注目度）: 86.7137362125503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a binary classification problem when the data comes from a mixture of two isotropic distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ with classification error $C_{\mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $\varepsilon$). Together our results imply that mixture models can be learned to within $\varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $\tilde O(d/\varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.
Abstract（参考訳）: 本研究では, 2 つの等方性分布が対数対数対数分布で満たし, 対数対数分布が満たしている場合の二分分類問題を考える。 We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. すなわち、自己学習は、ラベルのない例のみを用いて弱い学習者を強い学習者に変換する。さらに、ロジスティック損失に対して勾配降下を行うことで、$o(d)$のラベル付き例のみを用いて分類誤差$c_{\mathrm{err}}$を持つ擬似ラベル$\boldsymbol{\beta}_{\mathrm{pl}}$が得られる(すなわち$\varepsilon$とは無関係)。その結果,半教師付き自己学習アルゴリズムを用いて,最大$o(d)$のラベル付き例と$\tilde o(d/\varepsilon^2)$のラベル付き例を用いて,混合モデルがベイズ最適精度の$\varepsilon$以内に学習できることが示唆された。

関連論文リスト

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization [0.0]
本稿では、mathbbRl$における翻訳の$boldsymbolmuを推定し、mathbbR_++$パラメータにおける$sigmaを縮小する複雑性に焦点を当てる。 MLE(Maximum Likelihood Estimation)ではNPハードとなるが、$varepsilon$-approxs for arbitrary $varepsilon > 0$ in $textpoly left( frac1varepsilon )$ time が得られる。
論文参考訳（メタデータ） (2025-01-17T13:07:52Z)
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
シングルインデックスモデル (SIM) は $sigma(mathbfwast cdot mathbfx)$ という形式の関数であり、$sigma: mathbbR to mathbbR$ は既知のリンク関数であり、$mathbfwast$ は隠れ単位ベクトルである。適切な学習者が$L2$-error of $O(mathrmOPT)+epsilon$。
論文参考訳（メタデータ） (2024-11-08T17:10:38Z)
Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
閾値降下を用いた単一ニューロンモデル学習のための近似境界を導出する。線形回帰問題も研究し、$sigma(mathbfx) = mathbfx$ となる。
論文参考訳（メタデータ） (2024-09-05T16:59:56Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
単一インデックス対象関数 $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ の勾配勾配勾配学習問題について検討する。 SGDに基づくアルゴリズムにより最適化された2層ニューラルネットワークは、情報指数に支配されない複雑さで$f_*$を学習する。
論文参考訳（メタデータ） (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
本稿では, 2次行列分解(BMF)問題に対する効率的な$(1+varepsilon)$-approximationアルゴリズムを提案する。目標は、低ランク因子の積として$mathbfA$を近似することである。我々の手法はBMF問題の他の一般的な変種に一般化する。
論文参考訳（メタデータ） (2023-06-02T18:55:27Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Coresets for Decision Trees of Signals [19.537354146654845]
仮にそのような行列に対して$(k,varepsilon)$-coresetを出力する最初のアルゴリズムを提供する。これは、決定木と -- 機械学習から -- 計算幾何学における分割木の間のリンクをフォージすることで実現している。
論文参考訳（メタデータ） (2021-10-07T05:49:55Z)
Provable Robustness of Adversarial Training for Learning Halfspaces with Noise [95.84614821570283]
ラベル雑音の存在下での敵対的ロバストなハーフスペースの特性を分析する。我々の知る限りでは、これは敵の訓練がノイズの分類子を与えることを示す最初の研究である。
論文参考訳（メタデータ） (2021-04-19T16:35:38Z)
Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models [59.015960528781115]
最適に $hatboldsymbol xrm L$ と $hatboldsymbol xrm s$ を組み合わせる方法を示す。我々は,$(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$の制限分布を確立するために,Adroximate Message Passing (AMP)アルゴリズムの設計と解析を行う。
論文参考訳（メタデータ） (2020-08-07T18:20:05Z)
Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time [14.990725929840892]
ここでは、$T(N, d)$は、その変換によって$d倍のN$行列を乗算するのに要する時間である。我々のランタイムは、外乱のない共分散推定において最も高速なアルゴリズムと一致し、最大で多対数因子となる。
論文参考訳（メタデータ） (2020-06-23T20:21:27Z)
Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
i) スパースガウス図形モデルの推論と (ii) スパース線形モデルの回復支援の2つの基本的問題と古典的問題に焦点をあてる。疎線型回帰については、$(bf x,y)$ が生成されるが、$y = bf xtopOmega* + MathcalN(0,1)$ と $(bf x, y)$ は、truncation set $S subseteq mathbbRd$ に属する場合にのみ見られる。
論文参考訳（メタデータ） (2020-06-17T09:21:00Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
期待される正方形損失から、最も適合した単一ニューロンを学習することの問題点を考察する。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。 ReLUアクティベーションでは、我々の人口リスク保証は$O(mathsfOPT1/2)+epsilon$である。
論文参考訳（メタデータ） (2020-05-29T07:20:35Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。