Deploying a BERT-based Query-Title Relevance Classifier in a Production
System: a View from the Trenches
- URL: http://arxiv.org/abs/2108.10197v1
- Date: Mon, 23 Aug 2021 14:28:23 GMT
- Title: Deploying a BERT-based Query-Title Relevance Classifier in a Production
System: a View from the Trenches
- Authors: Leonard Dahlmann, Tomer Lancewicki
- Abstract summary: Bidirectional Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks.
It is challenging to scale BERT for low-latency and high- throughput industrial use cases due to its enormous size.
We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM)
BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task
- Score: 3.1219977244201056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Bidirectional Encoder Representations from Transformers (BERT) model has
been radically improving the performance of many Natural Language Processing
(NLP) tasks such as Text Classification and Named Entity Recognition (NER)
applications. However, it is challenging to scale BERT for low-latency and
high-throughput industrial use cases due to its enormous size. We successfully
optimize a Query-Title Relevance (QTR) classifier for deployment via a compact
model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM).
The model is capable of inferring an input in at most 0.2ms on CPU. BertBiLSTM
exceeds the off-the-shelf BERT model's performance in terms of accuracy and
efficiency for the aforementioned real-world production task. We achieve this
result in two phases. First, we create a pre-trained model, called eBERT, which
is the original BERT architecture trained with our unique item title corpus. We
then fine-tune eBERT for the QTR task. Second, we train the BertBiLSTM model to
mimic the eBERT model's performance through a process called Knowledge
Distillation (KD) and show the effect of data augmentation to achieve the
resembling goal. Experimental results show that the proposed model outperforms
other compact and production-ready models.
Related papers
- Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Which Student is Best? A Comprehensive Knowledge Distillation Exam for
Task-Specific BERT Models [3.303435360096988]
We perform knowledge distillation benchmark from task-specific BERT-base teacher models to various student models.
Our experiment involves 12 datasets grouped in two tasks: text classification and sequence labeling in the Indonesian language.
Our experiments show that, despite the rising popularity of Transformer-based models, using BiLSTM and CNN student models provide the best trade-off between performance and computational resource.
arXiv Detail & Related papers (2022-01-03T10:07:13Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Evaluation of BERT and ALBERT Sentence Embedding Performance on
Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT.
We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z) - BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization.
We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes.
Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - DSC IIT-ISM at SemEval-2020 Task 6: Boosting BERT with Dependencies for
Definition Extraction [9.646922337783133]
We explore the performance of Bidirectional Representations from Transformers (BERT) at definition extraction.
We propose a joint model of BERT and Text Level Graph Convolutional Network so as to incorporate dependencies into the model.
arXiv Detail & Related papers (2020-09-17T09:48:59Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z) - TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
Efficient Retrieval [11.923682816611716]
We present TwinBERT model for effective and efficient retrieval.
It has twin-structured BERT-like encoders to represent query and document respectively.
It allows document embeddings to be pre-computed offline and cached in memory.
arXiv Detail & Related papers (2020-02-14T22:44:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.