BERTVision -- A Parameter-Efficient Approach for Question Answering
- URL: http://arxiv.org/abs/2202.12210v1
- Date: Thu, 24 Feb 2022 17:16:25 GMT
- Title: BERTVision -- A Parameter-Efficient Approach for Question Answering
- Authors: Siduo Jiang, Cristopher Benge, William Casey King
- Abstract summary: We present a highly parameter efficient approach for Question Answering that significantly reduces the need for extended BERT fine-tuning.
Our method uses information from the hidden state activations of each BERT transformer layer, which is discarded during typical BERT inference.
Our experiments show that this approach works well not only for span QA, but also for classification, suggesting that it may be to a wider range of tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a highly parameter efficient approach for Question Answering that
significantly reduces the need for extended BERT fine-tuning. Our method uses
information from the hidden state activations of each BERT transformer layer,
which is discarded during typical BERT inference. Our best model achieves
maximal BERT performance at a fraction of the training time and GPU or TPU
expense. Performance is further improved by ensembling our model with BERTs
predictions. Furthermore, we find that near optimal performance can be achieved
for QA span annotation using less training data. Our experiments show that this
approach works well not only for span annotation, but also for classification,
suggesting that it may be extensible to a wider range of tasks.
Related papers
- SetBERT: Enhancing Retrieval Performance for Boolean Logic and Set Operation Queries [0.8192907805418583]
We introduce SetBERT, a fine-tuned BERT-based model to enhance query embeddings for set operations and Boolean logic queries.
Our experiments reveal that SetBERT-base not only significantly outperforms BERT-base but also achieves performance comparable to the much larger BERT-large model.
arXiv Detail & Related papers (2024-06-25T05:14:54Z) - DPBERT: Efficient Inference for BERT based on Dynamic Planning [11.680840266488884]
Existing input-adaptive inference methods fail to take full advantage of the structure of BERT.
We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT.
Our method reduces latency to 75% while maintaining 98% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.
arXiv Detail & Related papers (2023-07-26T07:18:50Z) - BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z) - Finding the Winning Ticket of BERT for Binary Text Classification via
Adaptive Layer Truncation before Fine-tuning [7.797987384189306]
We construct a series of BERT-based models with different size and compare their predictions on 8 binary classification tasks.
The results show there truly exist smaller sub-networks performing better than the full model.
arXiv Detail & Related papers (2021-11-22T02:22:47Z) - To BERT or Not to BERT: Comparing Task-specific and Task-agnostic
Semi-Supervised Approaches for Sequence Tagging [46.62643525729018]
Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data.
We show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.
arXiv Detail & Related papers (2020-10-27T04:03:47Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.