Fugu-MT 論文翻訳(概要): How Tight Can PAC-Bayes be in the Small Data Regime?

論文の概要: How Tight Can PAC-Bayes be in the Small Data Regime?

arxiv url: http://arxiv.org/abs/2106.03542v1
Date: Mon, 7 Jun 2021 12:11:32 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-08 18:23:23.054628
Title: How Tight Can PAC-Bayes be in the Small Data Regime?
Title（参考訳）: PAC-Bayesはどのようにして小さなデータレジームに入るのか?
Authors: Andrew Y. K. Foong, Wessel P. Bruinsma, David R. Burt, Richard E. Turner
Abstract要約: PAC-Bayesとテストセット境界は、小さなデータセットに対して作成できる。 PAC-Bayes境界は、よく用いられるチャーノフテストセット境界と驚くほど競合することを示した。最もシャープなテストセット境界は、我々が考慮しているPAC-Bayes境界よりも、一般化誤差のより良い保証につながる。
参考スコア（独自算出の注目度）: 39.15172162668061
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we investigate the question: Given a small number of datapoints, for example N = 30, how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by discarding data. In this setting, PAC-Bayes bounds are especially attractive, due to their ability to use all the data to simultaneously learn a posterior and bound its generalisation risk. We focus on the case of i.i.d. data with a bounded loss and consider the generic PAC-Bayes theorem of Germain et al. (2009) and Begin et al. (2016). While their theorem is known to recover many existing PAC-Bayes bounds, it is unclear what the tightest bound derivable from their framework is. Surprisingly, we show that for a fixed learning algorithm and dataset, the tightest bound of this form coincides with the tightest bound of the more restrictive family of bounds considered in Catoni (2007). In contrast, in the more natural case of distributions over datasets, we give examples (both analytic and numerical) showing that the family of bounds in Catoni (2007) can be suboptimal. Within the proof framework of Germain et al. (2009) and Begin et al. (2016), we establish a lower bound on the best bound achievable in expectation, which recovers the Chernoff test set bound in the case when the posterior is equal to the prior. Finally, to illustrate how tight these bounds can potentially be, we study a synthetic one-dimensional classification task in which it is feasible to meta-learn both the prior and the form of the bound to obtain the tightest PAC-Bayes and test set bounds possible. We find that in this simple, controlled scenario, PAC-Bayes bounds are surprisingly competitive with comparable, commonly used Chernoff test set bounds. However, the sharpest test set bounds still lead to better guarantees on the generalisation error than the PAC-Bayes bounds we consider.
Abstract（参考訳）: 例えば、N = 30 のような少数のデータポイントを与えられた場合、PAC-Bayes とテストセット境界はどの程度厳密か? このような小さなデータセットの場合、テストセット境界はデータを捨てることで一般化性能に悪影響を及ぼす。この設定では、PAC-Bayes境界は、すべてのデータを同時に学習し、その一般化リスクを束縛する能力のため、特に魅力的である。 i.i.d.の場合に焦点を当てる。有界な損失を持つデータとgermain et alの一般的なpac-bayes定理を考える。 2009年)とBegin et al。 (2016). 彼らの定理は多くの既存のPAC-ベイズ境界を復元することが知られているが、彼らのフレームワークから最も厳密な境界が何であるかは明らかではない。驚くべきことに、固定学習アルゴリズムとデータセットでは、この形式の最も厳密な境界は、カトーニ (2007) で考慮されたより制限的な境界の族の最強境界と一致する。対照的に、データセット上の分布のより自然な場合、カトニ (2007) における境界の族が準最適であることを示す例(解析的および数値的)を挙げる。 Germainらによる証明フレームワーク内にある。 2009年)とBegin et al。 (2016) では, 後方が先行値と等しい場合に有界なチャーノフ検定セットを回収する, 期待できる最良有界上の下限を定めている。最後に,これらの境界がいかに緊密であるかを説明するために,最強のPAC-Bayesおよびテストセット境界を得るための境界の事前および形の両方をメタ学習することが可能な合成一次元分類タスクについて検討する。この単純で制御されたシナリオでは、PAC-Bayes境界は、よく使われるチャーノフテストセット境界と驚くほど競合する。しかしながら、最も鋭いテストセット境界は、我々が検討するpac-bayes境界よりも、一般化エラーの保証が向上する。

論文の概要: How Tight Can PAC-Bayes be in the Small Data Regime?

関連論文リスト