On Robustness of Finetuned Transformer-based NLP Models
- URL: http://arxiv.org/abs/2305.14453v2
- Date: Wed, 8 Nov 2023 16:46:20 GMT
- Title: On Robustness of Finetuned Transformer-based NLP Models
- Authors: Pavan Kalyan Reddy Neerudu, Subba Reddy Oota, Mounika Marreddy,
Venkateswara Rao Kagita, Manish Gupta
- Abstract summary: We characterize changes between pretrained and finetuned language model representations across layers using two metrics: CKA and STIR.
GPT-2 representations are more robust than BERT and T5 across multiple types of input perturbations.
This study provides valuable insights into perturbation-specific weaknesses of popular Transformer-based models.
- Score: 11.063628128069736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based pretrained models like BERT, GPT-2 and T5 have been
finetuned for a large number of natural language processing (NLP) tasks, and
have been shown to be very effective. However, while finetuning, what changes
across layers in these models with respect to pretrained checkpoints is
under-studied. Further, how robust are these models to perturbations in input
text? Does the robustness vary depending on the NLP task for which the models
have been finetuned? While there exists some work on studying the robustness of
BERT finetuned for a few NLP tasks, there is no rigorous study that compares
this robustness across encoder only, decoder only and encoder-decoder models.
In this paper, we characterize changes between pretrained and finetuned
language model representations across layers using two metrics: CKA and STIR.
Further, we study the robustness of three language models (BERT, GPT-2 and T5)
with eight different text perturbations on classification tasks from the
General Language Understanding Evaluation (GLUE) benchmark, and generation
tasks like summarization, free-form generation and question generation. GPT-2
representations are more robust than BERT and T5 across multiple types of input
perturbation. Although models exhibit good robustness broadly, dropping nouns,
verbs or changing characters are the most impactful. Overall, this study
provides valuable insights into perturbation-specific weaknesses of popular
Transformer-based models, which should be kept in mind when passing inputs. We
make the code and models publicly available
[https://github.com/PavanNeerudu/Robustness-of-Transformers-models].
Related papers
- Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers [0.21756081703276003]
This thesis provides methods and analysis of models which make progress on this goal.
We introduce two new finetuning methods which add new capabilities to the models they are used on.
We provide theoretical and empirical insights on the divergence of model-likelihood and output quality.
arXiv Detail & Related papers (2024-08-29T03:50:24Z) - A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - Efficient GPT Model Pre-training using Tensor Train Matrix
Representation [65.96485282393361]
Large-scale transformer models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch.
To reduce the number of parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Train Matrix(TTM) structure.
The resulting GPT-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model.
arXiv Detail & Related papers (2023-06-05T08:38:25Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - Leveraging Pre-trained Models for Failure Analysis Triplets Generation [0.0]
We leverage the attention mechanism of pre-trained causal language models such as Transformer model for the downstream task of generating Failure Analysis Triplets (FATs)
We observe that Generative Pre-trained Transformer 2 (GPT2) outperformed other transformer model for the failure analysis triplet generation (FATG) task.
In particular, we observe that GPT2 (trained on 1.5B parameters) outperforms pre-trained BERT, BART and GPT3 by a large margin on ROUGE.
arXiv Detail & Related papers (2022-10-31T17:21:15Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - Scale Efficiently: Insights from Pre-training and Fine-tuning
Transformers [57.931830650323]
This paper presents scaling insights from pretraining and finetuning Transformers.
We show that aside from only the model size, model shape matters for downstream fine-tuning.
We present improved scaling protocols whereby our redesigned models achieve similar downstream fine-tuning quality.
arXiv Detail & Related papers (2021-09-22T12:29:15Z) - Non-Autoregressive Translation by Learning Target Categorical Codes [59.840510037250944]
We propose CNAT, which learns implicitly categorical codes as latent variables into the non-autoregressive decoding.
Experiment results show that our model achieves comparable or better performance in machine translation tasks.
arXiv Detail & Related papers (2021-03-21T14:12:34Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Stress Test Evaluation of Transformer-based Models in Natural Language
Understanding Tasks [3.2442879131520126]
This work evaluates three Transformer-based models (RoBERTa, XLNet, and BERT) in Natural Language Inference (NLI) and Question Answering (QA) tasks.
Our experiments reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks.
arXiv Detail & Related papers (2020-02-14T21:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.