Fugu-MT 論文翻訳(概要): OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

論文の概要: OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

arxiv url: http://arxiv.org/abs/2212.12017v1
Date: Thu, 22 Dec 2022 19:56:09 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-26 16:35:36.812239
Title: OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Title（参考訳）: OPT-IML:一般化レンズによる言語モデル学習のスケーリング
Authors: Srinivasan Iyer and Xi Victoria Lin and Ramakanth Pasunuru and Todor Mihaylov and Daniel Simig and Ping Yu and Kurt Shuster and Tianlu Wang and Qing Liu and Punit Singh Koura and Xian Li and Brian O'Horo and Gabriel Pereyra and Jeff Wang and Christopher Dewan and Asli Celikyilmaz and Luke Zettlemoyer and Ves Stoyanov
Abstract要約: モデルサイズとベンチマークサイズの両方をスケールする際のダウンストリームタスク性能に対する命令チューニング決定の影響について述べる。我々は、OPT-30Bに適用された命令調整決定に関する知見を提示し、さらにこれらの知見を活用して、OPTの命令調整版であるOPT-IML 30Bと175Bを訓練する。
参考スコア（独自算出の注目度）: 101.37439352091612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.
Abstract（参考訳）: 近年の研究では、命令チューニング(インストラクションチューニング)によって記述されたタスクの集合上で、微調整された大規模な事前学習言語モデルが、ゼロおよび少数ショットの一般化を未確認タスクに改善することを示した。しかし、命令チューニングプロセス中に異なる決定を下すパフォーマンスのトレードオフについては、限定的な理解がある。これらの決定には、命令チューニングベンチマークのスケールと多様性、異なるタスクサンプリング戦略、デモの有無による微調整、推論と対話のための特殊なデータセットを用いたトレーニング、そして最後に、微調整目標そのものが含まれる。本稿では,モデルサイズとベンチマークサイズの両方をスケールする場合に,命令チューニングがダウンストリームタスク性能に与える影響を特徴付ける。そこで我々は,既存の8つのベンチマークからタスクカテゴリに集約された2000のNLPタスクのインストラクションメタラーニング(IML)のための大規模ベンチマークであるOPT-IML Benchを作成し,完全に保留されたカテゴリからタスクへ,見られたカテゴリから保留されたタスクへ,そして、見たタスクから保留するインスタンスへ,という3種類のモデル一般化を測定するための評価フレームワークを準備した。このフレームワークのレンズを通して、まず、OPT-30Bに適用された命令調整決定に関する知見を提示し、さらにこれらの知見を利用して、命令調整版であるOPT-IML 30Bと175Bをトレーニングする。 opt-imlは、プロンプトソース、flan、super-naturalinstruction、unifiedskgの4つの異なる評価ベンチマークで、両方のスケールで3つの一般化能力を示す。すべてのベンチマークでOPTを大きく上回るだけでなく、各ベンチマークで微調整された既存のモデルと非常に競争力がある。我々は,OPT-IML Bench評価フレームワークとともに,OPT-IMLを両スケールでリリースする。

論文の概要: OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

関連論文リスト