Fugu-MT 論文翻訳(概要): Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse

論文の概要: Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse

arxiv url: http://arxiv.org/abs/2310.02396v4
Date: Thu, 31 Oct 2024 19:22:01 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:29.464626
Title: Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse
Title（参考訳）: マルチタスク学習と微調整の帰納バイアス--機能再利用の複数の方法
Authors: Samuel Lippl, Jack W. Lindsey,
Abstract要約: ニューラルネットワークは、複数のタスク(マルチタスク学習、MTL)とシーケンシャル(事前学習、その後の微調整、PT+FT)で訓練されることが多い。このアプローチが普及しているにもかかわらず、複数のタスクの学習から生じる帰納的バイアスは、著しく特徴づけられる。対角線ネットワークおよび単層ReLUネットワークにおけるMTLおよびPT+FTに付随する新しい暗黙正則化法則について述べる。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks are often trained on multiple tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In particular, it is common practice to pretrain neural networks on a large auxiliary task before finetuning on a downstream task with fewer samples. Despite the prevalence of this approach, the inductive biases that arise from learning multiple tasks are poorly characterized. In this work, we address this gap. We describe novel implicit regularization penalties associated with MTL and PT+FT in diagonal linear networks and single-hidden-layer ReLU networks. These penalties indicate that MTL and PT+FT induce the network to reuse features in different ways. 1) Both MTL and PT+FT exhibit biases towards feature reuse between tasks, and towards sparsity in the set of learned features. We show a "conservation law" that implies a direct tradeoff between these two biases. 2) PT+FT exhibits a novel "nested feature selection" regime, not described by either the "lazy" or "rich" regimes identified in prior work, which biases it to rely on a sparse subset of the features learned during pretraining. This regime is much narrower for MTL. 3) PT+FT (but not MTL) in ReLU networks benefits from features that are correlated between the auxiliary and main task. We confirm these findings empirically with teacher-student models, and introduce a technique -- weight rescaling following pretraining -- that can elicit the nested feature selection regime. Finally, we validate our theory in deep neural networks trained on image classification. We find that weight rescaling improves performance when it causes models to display signatures of nested feature selection. Our results suggest that nested feature selection may be an important inductive bias for finetuning neural networks.
Abstract（参考訳）: ニューラルネットワークは、複数のタスク(マルチタスク学習、MTL)とシーケンシャル(事前学習、その後の微調整、PT+FT)で同時にトレーニングされることが多い。特に、より少ないサンプルで下流タスクを微調整する前に、大規模な補助タスクでニューラルネットワークを事前訓練することが一般的である。このアプローチが普及しているにもかかわらず、複数のタスクの学習から生じる帰納的バイアスは、著しく特徴づけられる。この作業では、このギャップに対処します。対角線ネットワークおよび単層ReLUネットワークにおけるMTLおよびPT+FTに付随する新しい暗黙正則化法則について述べる。これらの罰則は、MTLとPT+FTが異なる方法で機能を再利用するためにネットワークを誘導することを示している。 1) MTL と PT+FT の両者は,タスク間の特徴再利用,学習した特徴の集合における疎性に偏りを示す。これら2つのバイアス間の直接的なトレードオフを意味する「保守法則」を示します。 2)PT+FTは,先行研究で特定された「怠慢」あるいは「豊かな」レジームによって説明されず,事前訓練中に学習した特徴の希少なサブセットに依存している,新しい「過酷な特徴選択」レジームを示す。この体制はMTLにとってより狭くなっている。 3) ReLU ネットワークにおける PT+FT (ただし MTL は含まない) は, 補助タスクと主タスクの相関する特徴の恩恵を受ける。これらの発見は、教師-学生モデルで実証的に確認し、ネストした特徴選択体制を引き出すことのできるテクニック、すなわち、事前トレーニング後のウェイト・リスケーリングを導入する。最後に、画像分類を訓練したディープニューラルネットワークにおいて、我々の理論を検証する。重み付け再スケーリングは、ネストした特徴選択のシグネチャをモデルに表示させると、性能が向上する。この結果から,ネストした特徴選択はニューラルネットワークの微細化において重要な帰納バイアスとなる可能性が示唆された。

論文の概要: Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse

関連論文リスト