Fugu-MT 論文翻訳(概要): Testing distributional assumptions of learning algorithms

論文の概要: Testing distributional assumptions of learning algorithms

arxiv url: http://arxiv.org/abs/2204.07196v1
Date: Thu, 14 Apr 2022 19:10:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-04-18 12:56:06.564136
Title: Testing distributional assumptions of learning algorithms
Title（参考訳）: 学習アルゴリズムの分布仮定の検証
Authors: Ronitt Rubinfeld and Arsen Vasilyan
Abstract要約: テストレーナー対 $(mathcalA,mathcalT)$ の設計について検討する。データ中の例の分布がテスタを$mathcalT$に渡せば、データ上の非依存な$mathcalA$の出力を安全に信頼できることを示す。
参考スコア（独自算出の注目度）: 5.204779946147061
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.
Abstract（参考訳）: 例の分布に関する強い仮定が可能である場合、例えば領域上のガウス性や一様性など、高速な非依存的学習アルゴリズムを持つ重要な高次元関数類が多数存在する。しかし、どのようにしてデータが分布的仮定を本当に満たし、無依存な学習アルゴリズムの出力品質を信頼できるという確信を持てるのだろうか? テスト者-学習者対の設計を体系的に研究するためのモデルである $(\mathcal{a},\mathcal{t})$ を提案する。データ内の例の分布がテスト者 $\mathcal{t}$ をパスした場合、データに無知な学習者 $\mathcal{a}$ の出力を安全に信頼することができる。モデルのパワーを示すために、標準ガウス分布の下で半空間を無知に学習する古典的な問題に適用し、n^{\tilde{o}(1/\epsilon^4)}$の合計実行時間を持つテスタ・リアナー対を示す。これは、このタスクでよく知られた一般的な非依存学習アルゴリズムと定性的に一致する。対照的に、有限サンプルガウス分布テスターは$L_1$とEMD距離測度には存在しない。解析における重要なステップは、ガウス系とほぼ一致する低次モーメントを持つ分布の濃度と反集中特性の新たなキャラクタリゼーションである。多項式近似理論のツールも使用します。対照的に,ガウス分布下での無知学習凸集合問題や,$\{0,1\}^n$ の均一分布下での単調ブール関数問題に対して,テスタ・リアナーペアの組合せ実行時間の強い下界を示す。これらの下位境界を通して、標準非依存の学習実行時間と最高のテスタ-ラーナーペアの実行時間の間に劇的なギャップがあるという自然な問題を示す。

関連論文リスト

Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
条件分布は機械学習の中心的な問題ですペアデータとペアデータの両方を統合する新しいパラダイムを提案する。提案手法は任意の誤差で理論上真の条件分布を復元可能であることを示す。
論文参考訳（メタデータ） (2024-10-03T16:12:59Z)
Agnostically Learning Multi-index Models with Queries [54.290489524576756]
本稿では,ガウス分布下での非依存学習の課題に対するクエリアクセスのパワーについて検討する。クエリアクセスは、MIMを不可知的に学習するためのランダムな例よりも大幅に改善されていることを示す。
論文参考訳（メタデータ） (2023-12-27T15:50:47Z)
Testable Learning with Distribution Shift [9.036777309376697]
分散シフトを伴うテスト可能学習と呼ばれる新しいモデルを定義する。テスト分布上の分類器の性能を証明可能なアルゴリズムを得る。ハーフスペースやハーフスペースの交点,決定木といった概念クラスを学ぶ上で,いくつかの肯定的な結果が得られる。
論文参考訳（メタデータ） (2023-11-25T23:57:45Z)
An Efficient Tester-Learner for Halfspaces [13.13131953359806]
本稿では、Rubinfeld と Vasilyan が最近定義したテスト可能な学習モデルにおいて、ハーフスペースを学習するための最初の効率的なアルゴリズムを提案する(2023)。このモデルでは、学習者は、トレーニングセットがテストに合格するたびに、その出力仮説の精度がほぼ最適であると認定する。
論文参考訳（メタデータ） (2023-02-28T18:51:55Z)
An Improved Analysis of Gradient Tracking for Decentralized Machine Learning [34.144764431505486]
トレーニングデータが$n$エージェントに分散されるネットワーク上での分散機械学習を検討する。エージェントの共通の目標は、すべての局所損失関数の平均を最小化するモデルを見つけることである。ノイズのない場合、$p$を$mathcalO(p-1)$から$mathcalO(p-1)$に改善します。
論文参考訳（メタデータ） (2022-02-08T12:58:14Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
ktimes k$, rank-r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ sample。我々は、$cal O(frackrepsilon2log2fracepsilon)$ sample, a number linear in the high dimension, and almost linear in the matrices, usually low, rank proofs.というアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-09-30T19:10:32Z)
Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
高確率状態に着目して離散分布を試験する問題について検討する。一定の要素でサンプル最適である近接性および独立性テストのための最初のアルゴリズムを提供する。
論文参考訳（メタデータ） (2020-09-14T16:09:17Z)
Downsampling for Testing and Learning in Product Distributions [24.48103093661132]
未知確率分布が $mathbbRd$ 上の積分布であるような分布自由なプロパティテストと学習問題について検討する。ハーフスペースの交叉、しきい値関数、凸集合、および$k$交互関数などの多くの重要な関数のクラスでは、既知のアルゴリズムは、分布のサポートサイズに依存する複雑さを持つ。本稿では,これらの問題を解消する一般手法として,ダウンログ(downlog)を提案する。
論文参考訳（メタデータ） (2020-07-15T02:46:44Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
補間分類器間のテストエラーの完全な分布を正確に計算する手法を開発した。テストエラーは、最悪の補間モデルのテストエラーから大きく逸脱する、小さな典型的な$varepsilon*$に集中する傾向にある。以上の結果から,統計的学習理論における通常の解析手法は,実際に観測された優れた一般化性能を捉えるのに十分な粒度にはならない可能性が示唆された。
論文参考訳（メタデータ） (2020-06-22T21:12:31Z)
Learning Halfspaces with Tsybakov Noise [50.659479930171585]
テュバコフ雑音の存在下でのハーフスペースの学習可能性について検討する。真半空間に関して誤分類誤差$epsilon$を達成するアルゴリズムを与える。
論文参考訳（メタデータ） (2020-06-11T14:25:02Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
本研究では,高次元ガウス混合系の対向ロバスト条件下での効率的な学習性について検討する。理論的に最適に近い誤り証明である$tildeO(epsilon)$の情報を、$epsilon$-corrupted $k$-mixtureで学習するアルゴリズムを提供する。我々の主な技術的貢献は、ガウス混合系からの新しい頑健な識別可能性証明クラスターであり、これは正方形の定度証明システムによって捉えることができる。
論文参考訳（メタデータ） (2020-05-13T16:44:12Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。