Fugu-MT 論文翻訳(概要): Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

論文の概要: Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

arxiv url: http://arxiv.org/abs/2202.10054v1
Date: Mon, 21 Feb 2022 09:03:34 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-22 17:39:21.546738
Title: Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution
Title（参考訳）: ファインチューニングは事前訓練された特徴を歪曲し、分布の過小評価する
Authors: Ananya Kumar, Aditi Raghunathan, Robbie Jones, Tengyu Ma, Percy Liang
Abstract要約: 微調整は、事前訓練された特徴が良好で分布シフトが大きい場合、線形探索よりも精度が良くなる。我々は,このIDとOODの精度のトレードオフが,簡単な設定でも生じることを理論的に示す。解析の結果,線形探究の容易な2段階戦略は,線形探究と線形探究の両方の利点を併せ持つことが明らかとなった。
参考スコア（独自算出の注目度）: 100.01469697743322
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer -- the "head"). It is well known that fine-tuning leads to better accuracy in-distribution (ID). However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large. On 10 distribution shift datasets (Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR $\to$ STL, CIFAR10.1, FMoW, ImageNetV2, ImageNet-R, ImageNet-A, ImageNet-Sketch), fine-tuning obtains on average 2% higher accuracy ID but 7% lower accuracy OOD than linear probing. We show theoretically that this tradeoff between ID and OOD accuracy arises even in a simple setting: fine-tuning overparameterized two-layer linear networks. We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head -- this is because while fine-tuning learns the head, the lower layers of the neural network change simultaneously and distort the pretrained features. Our analysis suggests that the easy two-step strategy of linear probing then full fine-tuning (LP-FT), sometimes used as a fine-tuning heuristic, combines the benefits of both fine-tuning and linear probing. Empirically, LP-FT outperforms both fine-tuning and linear probing on the above datasets (1% better ID, 10% better OOD than full fine-tuning).
Abstract（参考訳）: 事前訓練されたモデルを下流タスクに転送する場合、2つの一般的なメソッドは完全な微調整(モデルパラメータの更新)と線形探索(最後の線形層である"ヘッド"を更新)である。微調整によって精度が向上すること(id)が知られている。しかし,本論文では,事前学習した特徴が良好で分布シフトが大きい場合に,線形分布分布(OOD)よりも微調整の方が精度が良いことを示す。 10の分散シフトデータセット(Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR $\to$ STL, CIFAR10.1, FMoW, ImageNetV2, ImageNet-R, ImageNet-A, ImageNet-Sketch)では、微調整は平均2%の精度IDで得られるが、線形プローブよりも7%低い精度のOODが得られる。我々は、IDとOODの精度のこのトレードオフが単純な設定でも生じることを理論的に示す。これは、微調整が頭部を学習する一方で、ニューラルネットワークの下位層が同時に変化し、事前訓練された特徴を歪ませるためである。解析の結果,線形探究法と線形探究法の両方の利点を併せ持つ,線形探究法(LP-FT)の容易な2段階戦略が示唆された。実証的には、LP-FTは上記のデータセット上で細調整と線形探索の両方に優れています(1%のID、10%のOODは完全な微調整よりも優れています)。

論文の概要: Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

関連論文リスト