Fugu-MT 論文翻訳(概要): LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

論文の概要: LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

arxiv url: http://arxiv.org/abs/2206.06522v1
Date: Mon, 13 Jun 2022 23:51:56 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-15 13:29:52.939280
Title: LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Title（参考訳）: LST:パラメータとメモリ効率向上のためのラダーサイドチューニング
Authors: Yi-Lin Sung, Jaemin Cho, Mohit Bansal
Abstract要約: 大規模な事前訓練されたモデルのパラメータセット全体を更新するのはコストがかかる。 PETL技術は、トレーニング済みのバックボーンネットワーク内のパラメータの小さなサブセットを更新して、新しいタスクを実行できる。本稿では,学習用メモリの必要量を大幅に削減するPETL技術であるLadder Side-Tuning (LST)を提案する。
参考スコア（独自算出の注目度）: 82.93130407930762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that reduces training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (ladders) from backbone networks and makes predictions. LST has significantly lower memory requirements than previous methods, because it does not require backpropagation through the backbone network, but instead only through the side network and ladder connections. We evaluate our method with various models (T5, CLIP-T5) on both NLP (GLUE) and vision-language (VQA, GQA, NLVR2, MSCOCO) tasks. LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2.7x more memory savings). Moreover, LST achieves higher accuracy than Adapter and LoRA in a low-memory regime. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models (T5-large, T5-3B), attaining better GLUE performance than full fine-tuning and other PETL methods. The exact same trend also holds in our experiments on VL tasks.
Abstract（参考訳）: 近年,下流タスクにおける大規模事前学習モデルが,様々な領域で採用されている。しかし、大きな事前訓練されたモデルのパラメータセット全体を更新するのはコストがかかる。最近提案されたパラメータ効率変換学習(PETL)技術では、トレーニング済みバックボーンネットワーク内のパラメータの小さなサブセット(パラメータの2%しか使用していない)を新しいタスクに更新することができるが、トレーニングメモリの要件を最大30%削減できる。これは、トレーニング可能なパラメータの勾配計算が、大きなトレーニング済みのバックボーンモデルによるバックプロパゲーションを必要とするためである。そこで本研究では,学習時のメモリ要求量を大幅に削減する新しいpetl手法であるlst(ladar side-tuning)を提案する。バックボーンネットワークに新たなパラメータを挿入する既存のパラメータ効率の手法とは異なり、バックボーンネットワークからのショートカット接続(ラダー)を介して中間的なアクティベーションを入力として取り出し、予測を行う、はしご側ネットワークを訓練する。 LSTは、バックボーンネットワークを通してのバックプロパゲーションを必要とせず、代わりにサイドネットワークとラグ接続によってのみメモリ要求が大幅に低下する。 NLP (GLUE) と視覚言語 (VQA, GQA, NLVR2, MSCOCO) の両方で, 様々なモデル (T5, CLIP-T5) を用いて評価を行った。 LSTはネットワーク全体を微調整するためにメモリコストの69%を節約するが、他の方法は同様のパラメータの使用で26%しか節約しない(従って2.7倍のメモリ節約)。さらに、LSTは低メモリ状態においてAdapterやLoRAよりも高い精度を達成する。この優れたメモリ効率の利点をさらに示すため、LSTをより大きなT5モデル(T5-large, T5-3B)に適用し、フルチューニングや他のPETL法よりもGLUE性能が向上した。全く同じ傾向が、VLタスクの実験にも見られる。

論文の概要: LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

関連論文リスト