Fugu-MT 論文翻訳(概要): Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

論文の概要: Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

arxiv url: http://arxiv.org/abs/2301.09696v1
Date: Mon, 23 Jan 2023 19:57:58 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-25 14:57:40.627739
Title: Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation
Title（参考訳）: 自己指導型学習における騒音の最適化:重要度サンプリングからノイズコントラスト推定へ
Authors: Omar Chehab and Alexandre Gramfort and Aapo Hyvarinen
Abstract要約: GAN(Generative Adversarial Networks)のように、最適な雑音分布はデータ分布に等しくなると広く想定されている。我々は、この自己教師型タスクをエネルギーベースモデルの推定問題として基礎づけるノイズ・コントラスト推定に目を向ける。本研究は, 最適雑音のサンプリングは困難であり, 効率性の向上は, データに匹敵する雑音分布を選択することに比べ, 緩やかに行うことができると結論付けた。
参考スコア（独自算出の注目度）: 80.07065346699005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning is an increasingly popular approach to unsupervised learning, achieving state-of-the-art results. A prevalent approach consists in contrasting data points and noise points within a classification task: this requires a good noise distribution which is notoriously hard to specify. While a comprehensive theory is missing, it is widely assumed that the optimal noise distribution should in practice be made equal to the data distribution, as in Generative Adversarial Networks (GANs). We here empirically and theoretically challenge this assumption. We turn to Noise-Contrastive Estimation (NCE) which grounds this self-supervised task as an estimation problem of an energy-based model of the data. This ties the optimality of the noise distribution to the sample efficiency of the estimator, which is rigorously defined as its asymptotic variance, or mean-squared error. In the special case where the normalization constant only is unknown, we show that NCE recovers a family of Importance Sampling estimators for which the optimal noise is indeed equal to the data distribution. However, in the general case where the energy is also unknown, we prove that the optimal noise density is the data density multiplied by a correction term based on the Fisher score. In particular, the optimal noise distribution is different from the data distribution, and is even from a different family. Nevertheless, we soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
Abstract（参考訳）: 自己監督学習(self-supervised learning)は、教師なし学習(unsupervised learning)に対する、最先端の成果を達成するための、ますます一般的なアプローチである。一般的なアプローチは、分類タスク内のデータポイントとノイズポイントの対比から成り立っている。包括的理論が欠落している一方で、GAN(Generative Adversarial Networks)のように、最適な雑音分布をデータ分布と同等にすべきであると広く考えられている。我々は経験的かつ理論的にこの仮定に挑戦する。我々は,この自己教師付きタスクをエネルギーベースモデルの推定問題として根拠とするノイズコントラスト推定(nce)に目を向ける。これにより、ノイズ分布の最適性は、漸近分散あるいは平均二乗誤差として厳密に定義される推定器のサンプル効率と結びついている。正規化定数が不明な特別な場合において、nceは最適なノイズがデータ分布と実際に等しい重要なサンプリング推定器の族を回復することを示す。しかし、エネルギーが未知の一般的な場合、最適な雑音密度はフィッシャースコアに基づく補正項によって乗算されたデータ密度であることが証明される。特に、最適なノイズ分布は、データ分布と異なり、異なるファミリーである。いずれにせよ,最適なノイズはサンプリングが困難である可能性があり,データに等しいノイズ分布を選択することに比べ,効率性の向上は緩やかである。

論文の概要: Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation

関連論文リスト