BERTer: The Efficient One
- URL: http://arxiv.org/abs/2407.14039v1
- Date: Fri, 19 Jul 2024 05:33:09 GMT
- Title: BERTer: The Efficient One
- Authors: Pradyumna Saligram, Andrew Lanpouthakoun,
- Abstract summary: We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity.
Our findings reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - LegalTurk Optimized BERT for Multi-Label Text Classification and NER [0.0]
We introduce our innovative modified pre-training approach by combining diverse masking strategies.
In this work, we focus on two essential downstream tasks in the legal domain: name entity recognition and multi-label text classification.
Our modified approach demonstrated significant improvements in both NER and multi-label text classification tasks compared to the original BERT model.
arXiv Detail & Related papers (2024-06-30T10:19:54Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z) - Context-gloss Augmentation for Improving Word Sense Disambiguation [0.0]
The goal of Word Sense Disambiguation (WSD) is to identify the sense of a polysemous word in a specific context.
We show that both sentence-level and word-level augmentation methods are effective strategies for WSD.
Also, we find out that performance can be improved by adding hypernyms' glosses obtained from a lexical knowledge base.
arXiv Detail & Related papers (2021-10-14T06:27:19Z) - Explaining and Improving BERT Performance on Lexical Semantic Change
Detection [22.934650688233734]
Recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models does not translate to our field.
We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word.
arXiv Detail & Related papers (2021-03-12T13:29:30Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.