Fugu-MT 論文翻訳(概要): Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

論文の概要: Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

arxiv url: http://arxiv.org/abs/2211.08583v3
Date: Mon, 5 Jun 2023 22:23:52 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-07 21:36:12.345564
Title: Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Title（参考訳）: アウト・オブ・ディストリビューション一般化のための最適化選択に関する実証的研究
Authors: Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas
Abstract要約: 現代のディープラーニングシステムは、テストデータ分布がトレーニングデータ分布とわずかに異なる場合、うまく一般化しない。本研究では,分布シフトの異なるクラスに対して,一般的な一階述語一般化の性能について検討する。
参考スコア（独自算出の注目度）: 16.386766049451317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution. While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address this question for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as testbeds for studying different types of shifts -- namely correlation and diversity shift. We search over a wide range of hyperparameters and examine classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings, which we expect to be helpful for practitioners: i) adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum SGD) on out-of-distribution performance. In particular, even though there is no significant difference in in-distribution performance, we show a measurable difference in out-of-distribution performance. ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset -- linear returns, increasing returns, and diminishing returns. For example, in the training of natural language data using Adam, fine-tuning the performance of in-distribution performance does not significantly contribute to the out-of-distribution generalization performance.
Abstract（参考訳）: 現代のディープラーニングシステムは、テストデータ分布がトレーニングデータ分布とわずかに異なる場合、うまく一般化しない。この脆弱性に対処するために多くの有望な研究がなされているが、オプティマイザの役割とその分散一般化性能に関する体系的な研究は行われていない。本研究では,経験的リスク最小化と不変リスク最小化の下での分布シフトの異なるクラスに対する一般的な一階最適化器の性能について検討する。本稿では,異なるタイプのシフトを研究するためのテストベッドとして,DomainBed,WILDS,バックグラウンドチャレンジを用いた画像とテキストの分類に対処する。我々は,幅広いハイパーパラメータを探索し,20,000モデル以上の分類精度(分布内および分布外)を検証した。私たちは以下の結果にたどり着き、実践者に役立つと期待しています。 i)適応オプティマイザ(例えばAdam)は、非適応オプティマイザ(例えば、SGD、運動量SGD)よりも分配性能が劣る。特に,分布内性能に有意な差はみられなかったが,分布外性能に有意差が認められた。二分配内性能及び分配外性能は、データセットに応じて、リニアリターン、リターンの増大、リターンの低下の3つのタイプの振舞いを示す。例えば、Adamを用いた自然言語データのトレーニングでは、分布内性能の微調整は分布外一般化性能に大きく寄与しない。

論文の概要: Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

関連論文リスト