Fugu-MT 論文翻訳(概要): LLM Pruning and Distillation in Practice: The Minitron Approach

論文の概要: LLM Pruning and Distillation in Practice: The Minitron Approach

arxiv url: http://arxiv.org/abs/2408.11796v3
Date: Sat, 30 Nov 2024 22:01:07 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-03 21:01:15.711376
Title: LLM Pruning and Distillation in Practice: The Minitron Approach
Title（参考訳）: LLMプルーニングと蒸留の実践:ミニトロンアプローチ
Authors: Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Pavlo Molchanov, Mohammad Shoeybi, Jan Kautz, Ameya Sunil Mahabaleshwarkar, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, Zijia Chen, Yoshi Suhara, Shizhe Diao, Chenhan Yu, Wei-Chun Chen, Hayley Ross, Daniel Korzekwa, Oluwatobi Olabiyi, Ashwath Aithal, Bryan Catanzaro,
Abstract要約: Llama 3.1 8B および Mistral NeMo 12B モデルを 4B および 8B パラメータに圧縮する。 1)深い刈り込みと(2)隠れた/保持/MLP(幅)刈り込みという2つの異なる刈り出し方を探る。このアプローチは、Llama 3.1 8Bから魅力的な4Bモデル、Mistral NeMo 12Bから最先端のMistral-NeMo-Minitron-8Bモデルを生成する。
参考スコア（独自算出の注目度）: 57.57486238643575
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a comprehensive report on compressing the Llama 3.1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation. We explore two distinct pruning strategies: (1) depth pruning and (2) joint hidden/attention/MLP (width) pruning, and evaluate the results on common benchmarks from the LM Evaluation Harness. The models are then aligned with NeMo Aligner and tested in instruct-tuned versions. This approach produces a compelling 4B model from Llama 3.1 8B and a state-of-the-art Mistral-NeMo-Minitron-8B (MN-Minitron-8B for brevity) model from Mistral NeMo 12B. We found that with no access to the original data, it is beneficial to slightly fine-tune teacher models on the distillation dataset. We open-source our base model weights on Hugging Face with a permissive license.
Abstract（参考訳）: Llama 3.1 8B と Mistral NeMo 12B をそれぞれ 4B と 8B のパラメータに圧縮する。本研究は,(1)深度刈り込みと(2)隠れ/保持/MLP(幅)刈り込みの2つの異なるプルーニング戦略について検討し,LM評価ハーネスによる共通ベンチマークの結果について検討する。モデルはNeMo Alignerと整列し、インストラクションされたバージョンでテストされる。このアプローチは、Llama 3.1 8Bから魅力的な4Bモデル、Mistral NeMo 12Bから最先端のMistral-NeMo-Minitron-8B(MN-Minitron-8B)モデルを生成する。元のデータにアクセスできなくても、蒸留データセット上でわずかに微調整された教師モデルを構築することは有益であることがわかった。私たちはHugging Faceのベースモデルの重みを寛容なライセンスでオープンソースにしています。

論文の概要: LLM Pruning and Distillation in Practice: The Minitron Approach

関連論文リスト