BitNet Distillation
- URL: http://arxiv.org/abs/2510.13998v1
- Date: Wed, 15 Oct 2025 18:28:12 GMT
- Title: BitNet Distillation
- Authors: Xun Wu, Shaohan Huang, Wenhui Wang, Ting Song, Li Dong, Yan Xia, Furu Wei,
- Abstract summary: We present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs into 1.58-bit precision.<n>BitDistill achieves strong task-specific performance with minimal computational cost.
- Score: 90.71353956177705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present BitNet Distillation (BitDistill), a lightweight pipeline that fine-tunes off-the-shelf full-precision LLMs (e.g., Qwen) into 1.58-bit precision (i.e., ternary weights {-1, 0, 1}) for specific downstream tasks, achieving strong task-specific performance with minimal computational cost. Specifically, BitDistill incorporates three key techniques: the SubLN module, as introduced in BitNet; multi-head attention distillation, based on MiniLM; and continual pre-training, which serves as a crucial warm-up step to mitigate the scalability issue of the performance gap between finetuned full-precision and 1.58-bit LLMs on specific tasks. Experimental results show that BitDistill achieves performance comparable to the full-precision counterpart models across model size, while enabling up to 10x memory savings and 2.65x faster inference on CPUs. Code is available at https://github.com/microsoft/BitNet.
Related papers
- Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity [100.07626315557599]
We show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models.<n>We propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification.
arXiv Detail & Related papers (2026-03-05T13:37:50Z) - BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs [95.73339037243105]
BitNet v2 is a framework enabling native 4-bit activation quantization for 1-bit Large Language Models.<n>H-BitLinear is a module applying an online Hadamard transformation prior to activation quantization.<n> Experiments show BitNet v2 trained from scratch with 8-bit activations matches BitNet b1.58 performance.
arXiv Detail & Related papers (2025-04-25T15:17:52Z) - BitNet b1.58 2B4T Technical Report [118.78752947128682]
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale.<n>Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability.
arXiv Detail & Related papers (2025-04-16T17:51:43Z) - Bitnet.cpp: Efficient Edge Inference for Ternary LLMs [71.5759603658299]
We introduce Bitnet, an inference system optimized for BitNet b1.58 and ternary LLMs.<n>Bitnet incorporates a novel mpGEMM library to facilitate sub-2-bits-per-weight, efficient and lossless inference.<n>Our experiments show that Bitnet achieves up to a 6.25x increase in speed over full-precision baselines and up to 2.32x over low-bit baselines.
arXiv Detail & Related papers (2025-02-17T15:06:28Z) - 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs [81.7388752468953]
We introduce bitnet, a tailored software stack designed to unlock the full potential of 1-bit Large Language Models.
In experiments, bitnet achieves significant speedups ranging from 2.37x to 6.17x on x CPUs and from 1.37x to 5.07x on ARM.
arXiv Detail & Related papers (2024-10-21T16:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.