Fugu-MT 論文翻訳(概要): How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

論文の概要: How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

arxiv url: http://arxiv.org/abs/2310.05492v2
Date: Wed, 1 Nov 2023 07:11:37 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-02 16:51:30.658213
Title: How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
Title（参考訳）: 教師付き微調整データ構成による大規模言語モデルの能力への影響
Authors: Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou
Abstract要約: 膨大な事前学習トークンとパラメータを持つ大規模言語モデル(LLM)は、算術的推論、コード生成、命令追従を含む能力が出現する。教師付き微調整(SFT)により複数の能力を持つ鍵を解除する方法を検討することが重要である。
参考スコア（独自算出の注目度）: 67.02182566213268
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) with enormous pre-training tokens and parameter amounts emerge abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). The open-source community has studied on ad-hoc SFT for each ability, while proprietary LLMs are versatile for all abilities. It is important to investigate how to unlock them with multiple abilities via SFT. In this study, we specifically focus on the data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. From a scaling perspective, we investigate the relationship between model abilities and various factors including data amounts, data composition ratio, model parameters, and SFT strategies. Our experiments reveal that different abilities exhibit different scaling patterns, and larger models generally show superior performance with the same amount of data. Mathematical reasoning and code generation improve as data amounts increase consistently, while the general ability is enhanced with about a thousand samples and improves slowly. We find data composition results in various abilities improvements with low data amounts, while conflicts of abilities with high data amounts. Our experiments further show that composition data amount impacts performance, while the influence of composition ratio is insignificant. Regarding the SFT strategies, we evaluate sequential learning multiple abilities are prone to catastrophic forgetting. Our proposed Dual-stage Mixed Fine-tuning (DMT) strategy learns specialized abilities first and then learns general abilities with a small amount of specialized data to prevent forgetting, offering a promising solution to learn multiple abilities with different scaling patterns.
Abstract（参考訳）: 膨大な事前学習トークンとパラメータを持つ大規模言語モデル(LLM)は、算術的推論、コード生成、命令追従などの能力を持つ。これらの能力はsft(supervised fine-tuning)によってさらに強化される。オープンソースコミュニティは、各能力に対してアドホックなSFTについて研究しているが、プロプライエタリなLLMはすべての能力に汎用性がある。 SFTを介して複数の能力でアンロックする方法を検討することが重要である。本研究では,SFTにおける数学的推論,コード生成,一般人適応能力間のデータ構成に着目した。スケーリングの観点から,モデル能力とデータ量,データ合成比,モデルパラメータ,sft戦略などさまざまな要因との関係について検討した。我々の実験によると、異なる能力は異なるスケーリングパターンを示し、大きなモデルは一般的に同じ量のデータで優れたパフォーマンスを示す。データ量が一貫して増加するにつれて、数学的推論とコード生成が改善され、1000のサンプルで一般的な能力が向上し、ゆっくりと改善される。データ構成の結果,低データ量では様々な能力向上が得られ,高データ量では能力の矛盾が生じている。さらに, 合成データ量が性能に影響を及ぼすのに対し, 組成比の影響は重要でないことを示した。 SFTの戦略に関して、逐次学習の多重能力は破滅的な忘れがちである。提案したDual-stage Mixed Fine-tuning(DMT)戦略は,まず特殊能力を学習し,次に少量の専門データを用いて汎用能力を学習し,異なるスケーリングパターンで複数の能力を学ぶための有望なソリューションを提供する。

論文の概要: How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

関連論文リスト