SplaXBERT: Leveraging Mixed Precision Training and Context Splitting for Question Answering
- URL: http://arxiv.org/abs/2412.05499v1
- Date: Sat, 07 Dec 2024 02:01:27 GMT
- Title: SplaXBERT: Leveraging Mixed Precision Training and Context Splitting for Question Answering
- Authors: Zhu Yufan, Hao Zeyu, Li Siqi, Niu Boqian,
- Abstract summary: Built on ALBERT-xlarge with context-splitting and mixed precision training, SplaXBERT achieves high efficiency in question-answering tasks on lengthy texts.
Tested on SQuAD v1.1, it attains an Exact Match of 85.95% and an F1 Score of 92.97%.
- Score: 0.0
- License:
- Abstract: SplaXBERT, built on ALBERT-xlarge with context-splitting and mixed precision training, achieves high efficiency in question-answering tasks on lengthy texts. Tested on SQuAD v1.1, it attains an Exact Match of 85.95% and an F1 Score of 92.97%, outperforming traditional BERT-based models in both accuracy and resource efficiency.
Related papers
- Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation [0.6793286055326242]
We create a lightweight yet powerful BERT based model for natural language processing applications.
We apply the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data.
arXiv Detail & Related papers (2024-10-30T17:57:44Z) - oBERTa: Improving Sparse Transfer Learning via improved initialization,
distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing.
It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression.
We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z) - BEBERT: Efficient and robust binary ensemble BERT [12.109371576500928]
Binarization of pre-trained BERT models can alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts.
We propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap.
arXiv Detail & Related papers (2022-10-28T08:15:26Z) - RoChBert: Towards Robust BERT Fine-tuning for Chinese [31.573147796706223]
RoChBERT is a framework to build more Robust BERT-based models.
It fuses Chinese phonetic and glyph features into pre-trained representations during fine-tuning.
arXiv Detail & Related papers (2022-10-28T07:08:00Z) - BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERT is an accurate fully binarized BERT to eliminate the performance bottlenecks.
Our method yields impressive 56.3 times and 31.2 times saving on FLOPs and model size.
arXiv Detail & Related papers (2022-03-12T09:46:13Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - Online Ensemble Model Compression using Knowledge Distillation [51.59021417947258]
This paper presents a knowledge distillation based model compression framework consisting of a student ensemble.
It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models.
We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness.
arXiv Detail & Related papers (2020-11-15T04:46:29Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.