Fugu-MT 論文翻訳(概要): Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

論文の概要: Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

arxiv url: http://arxiv.org/abs/2603.11881v1
Date: Thu, 12 Mar 2026 12:57:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.083199
Title: Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language
Title（参考訳）: Bielik-Minitron-7B:ポーランド語のための構造化プルーニングと知識蒸留による大規模言語モデル圧縮
Authors: Remigiusz Kinas, Paweł Kiszczak, Sergio P. Perez, Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej,
Abstract要約: 本報告では、Bielik-11B-v3.0モデルの圧縮7.35BパラメータバージョンであるBielik-Minitron-7Bの作成について詳述する。 NVIDIA Minitronアプローチにインスパイアされた2段階圧縮手法を利用して、構造化されたハイブリッドプルーニングと知識蒸留を組み合わせることで、モデルのパラメータ数を33.4%削減した。最終モデルでは,ベースラインモデルの性能の約90%を回復し,最大50%の高速化を実現した。
参考スコア（独自算出の注目度）: 1.5944225617726497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This report details the creation of Bielik-Minitron-7B, a compressed 7.35B parameter version of the Bielik-11B-v3.0 model, specifically optimized for European languages. By leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined structured hybrid pruning and knowledge distillation to reduce the model's parameter count by 33.4%, from 11.04B to 7.35B. We utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation for quality recovery. Following distillation, the model underwent a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning (GRPO). Our final model successfully recovered approximately 90% of the baseline model's performance while providing up to 50% inference speedup. This approach demonstrates an efficient pathway to create language models for less-represented languages, preserving the original model quality while reducing inference deployment costs.
Abstract（参考訳）: 本報告では、Belik-11B-v3.0モデルの圧縮7.35BパラメータバージョンであるBielik-Minitron-7Bの作成について詳述する。 NVIDIA Minitronアプローチにインスパイアされた2段階圧縮手法を利用して、構造化されたハイブリッドプルーニングと知識蒸留を組み合わせて、モデルのパラメータ数を11.04Bから7.35Bに33.4%削減した。構造解析にはNVIDIA Model Optimizer,品質回復にはNVIDIA NeMo Frameworkを用いた。蒸留後のモデルでは, スーパービジョンファインチューニング (SFT) , 直接選好最適化 (DPO-P) , 強化学習 (GRPO) からなる厳密なアライメントパイプラインが実施された。最終的なモデルでは,ベースラインモデルの性能の約90%を回復し,最大50%の推論速度向上を実現した。このアプローチは、表現の少ない言語のための言語モデルを作成するための効率的な経路を示し、推論のデプロイメントコストを削減しつつ、オリジナルのモデル品質を保存する。

論文の概要: Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

関連論文リスト