Language Representation Models for Fine-Grained Sentiment Classification
- URL: http://arxiv.org/abs/2005.13619v1
- Date: Wed, 27 May 2020 20:01:56 GMT
- Title: Language Representation Models for Fine-Grained Sentiment Classification
- Authors: Brian Cheang, Bailey Wei, David Kogan, Howey Qiu, Masud Ahmed
- Abstract summary: We show that AlBERT suffers significantly more accuracy loss than reported on other tasks, while DistilBERT has accuracy loss similar to their reported loss on other tasks.
We conclude that RoBERTa reaches anew state-of-the-art accuracy for prediction on the SST-5 root level (60.2%).
- Score: 2.1664197735413824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment classification is a quickly advancing field of study with
applications in almost any field. While various models and datasets have shown
high accuracy inthe task of binary classification, the task of fine-grained
sentiment classification is still an area with room for significant
improvement. Analyzing the SST-5 dataset,previous work by Munikar et al. (2019)
showed that the embedding tool BERT allowed a simple model to achieve
state-of-the-art accuracy. Since that paper, several BERT alternatives have
been published, with three primary ones being AlBERT (Lan et al., 2019),
DistilBERT (Sanh et al. 2019), and RoBERTa (Liu etal. 2019). While these models
report some improvement over BERT on the popular benchmarks GLUE, SQuAD, and
RACE, they have not been applied to the fine-grained classification task. In
this paper, we examine whether the improvements hold true when applied to a
novel task, by replicating the BERT model from Munikar et al., and swapping the
embedding layer to the alternative models. Over the experiments, we found that
AlBERT suffers significantly more accuracy loss than reported on other tasks,
DistilBERT has accuracy loss similar to their reported loss on other tasks
while being the fastest model to train, and RoBERTa reaches anew
state-of-the-art accuracy for prediction on the SST-5 root level (60.2%).
Related papers
- Fine-tuning BERT with Bidirectional LSTM for Fine-grained Movie Reviews Sentiment Analysis [0.0]
We fine-tune the pre-trained BERT model with Bidirectional LSTM (BiLSTM) to enhance both binary and fine-grained SA for movie reviews.
We present our findings on binary classification as well as fine-grained classification utilizing benchmark datasets.
arXiv Detail & Related papers (2025-02-28T03:30:48Z) - Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification [0.0]
We implement a novel BERT architecture for multitask fine-tuning on three downstream tasks.
Our model, Multitask BERT, incorporates layer sharing and a triplet architecture, custom sentence pair tokenization, loss pairing, and gradient surgery.
We also apply generative adversarial learning to BERT, constructing a conditional generator model that maps from latent space to create fake embeddings.
arXiv Detail & Related papers (2024-08-11T20:05:54Z) - CEEBERT: Cross-Domain Inference in Early Exit BERT [5.402030962296633]
CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly.
CeeBERT can speed up the BERT/ALBERT models by $2times$ - $3.5times$ with minimal drop in accuracy.
arXiv Detail & Related papers (2024-05-23T20:36:10Z) - Breaking the Token Barrier: Chunking and Convolution for Efficient Long
Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks.
BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input.
We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z) - Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data.
Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z) - Pretraining Without Attention [114.99187017618408]
This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs)
BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
arXiv Detail & Related papers (2022-12-20T18:50:08Z) - TeST: Test-time Self-Training under Distribution Shift [99.68465267994783]
Test-Time Self-Training (TeST) is a technique that takes as input a model trained on some source data and a novel data distribution at test time.
We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms.
arXiv Detail & Related papers (2022-09-23T07:47:33Z) - Finding the Winning Ticket of BERT for Binary Text Classification via
Adaptive Layer Truncation before Fine-tuning [7.797987384189306]
We construct a series of BERT-based models with different size and compare their predictions on 8 binary classification tasks.
The results show there truly exist smaller sub-networks performing better than the full model.
arXiv Detail & Related papers (2021-11-22T02:22:47Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and
Strong Baselines [31.807628937487927]
Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.
Previous literature identified two potential reasons for the observed instability: catastrophic forgetting and small size of the fine-tuning datasets.
We show that both hypotheses fail to explain the fine-tuning instability.
arXiv Detail & Related papers (2020-06-08T19:06:24Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.