Fugu-MT 論文翻訳(概要): The Future of Large Language Model Pre-training is Federated

論文の概要: The Future of Large Language Model Pre-training is Federated

arxiv url: http://arxiv.org/abs/2405.10853v3
Date: Mon, 14 Oct 2024 16:37:29 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-03 05:43:14.026459
Title: The Future of Large Language Model Pre-training is Federated
Title（参考訳）: 大規模言語モデル事前学習の今後
Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane,
Abstract要約: 我々は,LLM事前学習のための新しいトレーニングパラダイムの調査と開発を可能にする,Photonと呼ばれるスケーラブルなデプロイメントシステムを提案する。数十億のパラメータを持つLCMを事前学習するために、プライベートデータソースと計算資源とのコラボレーションに関心のある組織がPhotonを利用できることを示す。さらに,モデルサイズによるフェデレーショントレーニング尺度の有効性を示すとともに,限られた資源を用いて数十億規模のフェデレーションLLMをトレーニングするためのアプローチを提案する。
参考スコア（独自算出の注目度）: 15.237418036900582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources they can leverage for pre-training. Federated learning (FL) has the potential to unleash the majority of the planet's data and computational resources, which are underutilized by the data-center-focused training methodology of current LLM practice. Our work presents a robust, flexible, reproducible FL approach that enables large-scale collaboration across institutions to train LLMs. We propose a scalable deployment system called Photon to enable the investigation and development of this new training paradigm for LLM pre-training. We show that Photon can be used by organizations interested in collaborating with their private data sources and computational resources for pre-training LLMs with billions of parameters. This paradigm would mobilize more computational and data resources while matching or potentially exceeding centralized performance. We further show the effectiveness of the federated training scales with model size and present our approach for training billion-scale federated LLMs using limited resources. Thus far, we have used Photon to train LLM models to the size of 7B parameters and anticipate larger models being completed in the near future. Finally, we show that LLM training is highly resilient to the classical challenges of federated statistical and hardware heterogeneity. Furthermore, we show that convergence is robust to partial participation, opening the avenue for compute-efficient collaborative training. Photon will help data-rich actors to become the protagonists of LLMs pre-training instead of leaving the stage to compute-rich actors alone.
Abstract（参考訳）: 生成事前訓練された大規模言語モデル(LLM)は、トレーニングされた前例のない量のデータのおかげで、広範囲のタスクに対して素晴らしいパフォーマンスを示している。確立されたスケーリング法則が示すように、LLMの将来的なパフォーマンス改善は、事前トレーニングに利用できる計算量とデータソースに依存する。フェデレーテッド・ラーニング(FL)は、現在のLLMの実践において、データセンター中心のトレーニング手法によって未利用である、地球のデータと計算資源の大部分を解き放つ可能性を持っている。我々の研究は、LLMを訓練する機関間で大規模なコラボレーションを可能にする、堅牢で柔軟で再現可能なFLアプローチを提示している。我々は,LLM事前学習のための新しいトレーニングパラダイムの調査と開発を可能にする,Photonと呼ばれるスケーラブルなデプロイメントシステムを提案する。数十億のパラメータを持つLCMを事前学習するために、プライベートデータソースと計算資源とのコラボレーションに関心のある組織がPhotonを利用できることを示す。このパラダイムは、中央集権的なパフォーマンスを維持しながら、より多くの計算とデータリソースを動員する。さらに,モデルサイズによるフェデレーショントレーニング尺度の有効性を示すとともに,限られた資源を用いて数十億規模のフェデレーションLLMをトレーニングするためのアプローチを提案する。これまでのところ、私たちは7BパラメータのサイズにLLMモデルをトレーニングするためにPhotonを使用しており、近い将来、より大きなモデルが完成すると予想しています。最後に、LLMトレーニングは、フェデレートされた統計的およびハードウェアの不均一性の古典的課題に対して高い弾力性を持つことを示す。さらに,コンバージェンスは部分的参加に頑健であり,計算効率のよい協調学習の道を開いた。 Photonは、データリッチなアクターが、計算リッチなアクターだけにステージを離れるのではなく、LCMの事前トレーニングの主人公になるのに役立つ。

論文の概要: The Future of Large Language Model Pre-training is Federated

関連論文リスト