Fugu-MT 論文翻訳(概要): InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

論文の概要: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

arxiv url: http://arxiv.org/abs/2310.07713v3
Date: Wed, 29 May 2024 04:15:39 GMT
ステータス: 翻訳完了
システム内更新日: 2024-05-31 02:11:35.630316
Title: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Title（参考訳）: InstructRetro: Retrieval-Augmented Pretrainingのインストラクションチューニング
Authors: Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro,
Abstract要約: Retro 48Bは検索で事前訓練された最大の大規模言語モデルである。 InstructRetroは、幅広いゼロショットタスクでチューニングされたGPTよりも大幅に改善されている。
参考スコア（独自算出の注目度）: 47.60376031955207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretraining auto-regressive large language models~(LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval. Specifically, we continue to pretrain a 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. Notably, the obtained foundation model, Retro 48B, largely outperforms the counterpart GPT 43B trained on 1.2T tokens in terms of perplexity with only 2.58% additional GPU hours, demonstrating the significant scaling potential of the method. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA and reading comprehension tasks, 10% over GPT across 4 challenging long-form QA tasks, and 16% over GPT across 3 summarization tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. Our results highlight the promising direction to obtain a better GPT decoder through continued pretraining with retrieval before instruction tuning. Our code and checkpoints are publicly available at: https://huggingface.co/nvidia/retro-48b-instruct-4k.
Abstract（参考訳）: 自動回帰型大言語モデル~(LLM)の検索による事前学習は、外部データベースを活用することにより、より複雑で現実的な正確性を示す。しかし、既存の事前学習によるLLMのサイズは制限されている(例えば、Retroは7.5Bパラメータを持つ)ため、命令チューニングとゼロショットの一般化の有効性が制限されている。本稿では,検索を前提としたLLMとしては最大規模のRetro 48Bを紹介する。具体的には、12兆のトークンから検索することで、Retro拡張法を用いて、さらに1000億のトークンに43BのGPTモデルを事前訓練し続けます。特に、得られた基盤モデルであるRetro 48Bは、1.2TトークンでトレーニングされたGPT 43Bを、わずか2.58%のGPU時間で上回っており、この手法のスケーリング可能性を示している。 Retroでのインストラクションチューニングの後、InstructRetroは幅広いゼロショットタスクにおいて、命令チューニングされたGPTよりも大幅に改善されていることを示す。具体的には、InstructRetroの平均的な改善は、8つの短い形式QAにまたがるGPTよりも7%、長い形式QAに10%、そして3つの要約タスクに16%である。驚いたことに、InstructRetroアーキテクチャからエンコーダを廃止し、デコーダのバックボーンを直接使用でき、同等の結果が得られます。提案手法は, 学習前の検索を継続し, より優れたGPTデコーダを得るための有望な方向を示すものである。私たちのコードとチェックポイントは、https://huggingface.co/nvidia/retro-48b-instruct-4k.comで公開されています。

論文の概要: InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

関連論文リスト