Can persistent homology whiten Transformer-based black-box models? A
case study on BERT compression
- URL: http://arxiv.org/abs/2312.10702v1
- Date: Sun, 17 Dec 2023 12:33:50 GMT
- Title: Can persistent homology whiten Transformer-based black-box models? A
case study on BERT compression
- Authors: Luis Balderas, Miguel Lastra and Jos\'e M. Ben\'itez
- Abstract summary: We propose Optimus BERT Compression and Explainability (OBCE) to bring explainability to BERT models.
Our methodology can "whiten" BERT models by providing explainability to its neurons and reducing the model's size.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) like BERT have gained significant prominence due
to their remarkable performance in various natural language processing tasks.
However, they come with substantial computational and memory costs.
Additionally, they are essentially black-box models, challenging to explain and
interpret. In this article, we propose Optimus BERT Compression and
Explainability (OBCE), a methodology to bring explainability to BERT models
using persistent homology, aiming to measure the importance of each neuron by
studying the topological characteristics of their outputs. As a result, we can
compress BERT significantly by reducing the number of parameters (58.47% of the
original parameters for BERT Base, 52.3% for BERT Large). We evaluated our
methodology on the standard GLUE Benchmark, comparing the results with
state-of-the-art techniques and achieving outstanding results. Consequently,
our methodology can "whiten" BERT models by providing explainability to its
neurons and reducing the model's size, making it more suitable for deployment
on resource-constrained devices.
Related papers
- EELBERT: Tiny Models through Dynamic Embeddings [0.28675177318965045]
EELBERT is an approach for compression of transformer-based models (e.g., BERT)
It is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations.
We develop our smallest model UNO-EELBERT, which achieves a GLUE score within 4% of fully trained BERT-tiny.
arXiv Detail & Related papers (2023-10-31T03:28:08Z) - BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z) - Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks.
These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices.
We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z) - Deploying a BERT-based Query-Title Relevance Classifier in a Production
System: a View from the Trenches [3.1219977244201056]
Bidirectional Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks.
It is challenging to scale BERT for low-latency and high- throughput industrial use cases due to its enormous size.
We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM)
BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task
arXiv Detail & Related papers (2021-08-23T14:28:23Z) - ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques [10.983311133796745]
Pre-trained language models of the BERT family have defined the state-of-the-arts in a wide range of NLP tasks.
Performance of BERT-based models is mainly driven by the enormous amount of parameters, which hinders their application to resource-limited scenarios.
We introduce three kinds of compression methods (weight pruning, low-rank factorization and knowledge distillation) and explore a range of designs concerning model architecture.
Our best compressed model, dubbed Refined BERT cOmpreSsion with InTegrAted techniques (ROSITA), is $7.5 times$ smaller than
arXiv Detail & Related papers (2021-03-21T11:33:33Z) - BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization.
We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes.
Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z) - Incorporating BERT into Parallel Sequence Decoding with Adapters [82.65608966202396]
We propose to take two different BERT models as the encoder and decoder respectively, and fine-tune them by introducing simple and lightweight adapter modules.
We obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models.
Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT.
arXiv Detail & Related papers (2020-10-13T03:25:15Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.