Fugu-MT 論文翻訳(概要): BitNet Distillation

論文の概要: BitNet Distillation

arxiv url: http://arxiv.org/abs/2510.13998v1
Date: Wed, 15 Oct 2025 18:28:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.58119
Title: BitNet Distillation
Title（参考訳）: BitNet 蒸留
Authors: Xun Wu, Shaohan Huang, Wenhui Wang, Ting Song, Li Dong, Yan Xia, Furu Wei,
Abstract要約: 我々はBitNet Distillation(BitDistill)という,市販のフル精度LCMを1.58ビットの精度で微調整する軽量パイプラインを提案する。 BitDistillは、最小の計算コストで、タスク固有のパフォーマンスを実現する。
参考スコア（独自算出の注目度）: 90.71353956177705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specifically, BitDistill incorporates three key techniques: the SubLN module, as introduced in BitNet; multi-head attention distillation, based on MiniLM; and continual pre-training, which serves as a crucial warm-up step to mitigate the scalability issue of the performance gap between finetuned full-precision and 1.58-bit LLMs on specific tasks. Experimental results show that BitDistill achieves performance comparable to the full-precision counterpart models across model size, while enabling up to 10x memory savings and 2.65x faster inference on CPUs. Code is available at https://github.com/microsoft/BitNet.
Abstract（参考訳）: 本稿では,ビットネット蒸留(BitDistill)を提案する。ビットネット蒸留(BitDistill)は,市販のフル精度LCM(eg,Qwen)を1.58ビット精度(3次重み {-1,0,1})に微調整し,計算コストを最小限に抑える軽量パイプラインである。具体的には、BitDistillは、BitNetで導入されたSubLNモジュール、MiniLMに基づくマルチヘッドアテンション蒸留、および特定のタスクにおける細調整されたフル精度と1.58ビットのLLM間のパフォーマンスギャップのスケーラビリティを緩和する重要なウォームアップステップとして機能する継続事前トレーニングの3つの主要なテクニックを組み込んでいる。実験の結果、BitDistillはモデルサイズにまたがる完全精度のモデルに匹敵する性能を実現し、最大10倍のメモリ節約と2.65倍高速なCPU推論を実現している。コードはhttps://github.com/microsoft/BitNet.comで入手できる。

論文の概要: BitNet Distillation

関連論文リスト