BiBERT: Accurate Fully Binarized BERT
- URL: http://arxiv.org/abs/2203.06390v1
- Date: Sat, 12 Mar 2022 09:46:13 GMT
- Title: BiBERT: Accurate Fully Binarized BERT
- Authors: Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu,
Qingqing Dang, Ziwei Liu, Xianglong Liu
- Abstract summary: BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
- Score: 69.35727280997617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large pre-trained BERT has achieved remarkable performance on Natural
Language Processing (NLP) tasks but is also computation and memory expensive.
As one of the powerful compression approaches, binarization extremely reduces
the computation and memory consumption by utilizing 1-bit parameters and
bitwise operations. Unfortunately, the full binarization of BERT (i.e., 1-bit
weight, embedding, and activation) usually suffer a significant performance
drop, and there is rare study addressing this problem. In this paper, with the
theoretical justification and empirical analysis, we identify that the severe
performance drop can be mainly attributed to the information degradation and
optimization direction mismatch respectively in the forward and backward
propagation, and propose BiBERT, an accurate fully binarized BERT, to eliminate
the performance bottlenecks. Specifically, BiBERT introduces an efficient
Bi-Attention structure for maximizing representation information statistically
and a Direction-Matching Distillation (DMD) scheme to optimize the full
binarized BERT accurately. Extensive experiments show that BiBERT outperforms
both the straightforward baseline and existing state-of-the-art quantized BERTs
with ultra-low bit activations by convincing margins on the NLP benchmark. As
the first fully binarized BERT, our method yields impressive 56.3 times and
31.2 times saving on FLOPs and model size, demonstrating the vast advantages
and potential of the fully binarized BERT model in real-world
resource-constrained scenarios.
Related papers
- BiDense: Binarization for Dense Prediction [62.70804353158387]
BiDense is a generalized binary neural network (BNN) designed for efficient and accurate dense prediction tasks.
BiDense incorporates two key techniques: the Distribution-adaptive Binarizer (DAB) and the Channel-adaptive Full-precision Bypass (CFB)
arXiv Detail & Related papers (2024-11-15T16:46:04Z) - DB-LLM: Accurate Dual-Binarization for Efficient LLMs [83.70686728471547]
Large language models (LLMs) have significantly advanced the field of natural language processing.
Existing ultra-low-bit quantization always causes severe accuracy drops.
We propose a novel Dual-Binarization method for LLMs, namely DB-LLM.
arXiv Detail & Related papers (2024-02-19T09:04:30Z) - DPBERT: Efficient Inference for BERT based on Dynamic Planning [11.680840266488884]
Existing input-adaptive inference methods fail to take full advantage of the structure of BERT.
We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT.
Our method reduces latency to 75% while maintaining 98% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.
arXiv Detail & Related papers (2023-07-26T07:18:50Z) - Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net
Estimation and Optimization [58.90989478049686]
Bi-Drop is a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets.
Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods.
arXiv Detail & Related papers (2023-05-24T06:09:26Z) - BEBERT: Efficient and robust binary ensemble BERT [12.109371576500928]
Binarization of pre-trained BERT models can alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts.
We propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap.
arXiv Detail & Related papers (2022-10-28T08:15:26Z) - BERTVision -- A Parameter-Efficient Approach for Question Answering [0.0]
We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning.
Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference.
Our experiments show that this approach works well not only for span QA, but also for classification, suggesting that it may be to a wider range of tasks.
arXiv Detail & Related papers (2022-02-24T17:16:25Z) - BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization.
We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes.
Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.