Related papers: Language Representation Models for Fine-Grained Sentiment Classification

Language Representation Models for Fine-Grained Sentiment Classification

URL: http://arxiv.org/abs/2005.13619v1
Date: Wed, 27 May 2020 20:01:56 GMT
Title: Language Representation Models for Fine-Grained Sentiment Classification
Authors: Brian Cheang, Bailey Wei, David Kogan, Howey Qiu, Masud Ahmed
Abstract summary: We show that AlBERT suffers significantly more accuracy loss than reported on other tasks, while DistilBERT has accuracy loss similar to their reported loss on other tasks. We conclude that RoBERTa reaches anew state-of-the-art accuracy for prediction on the SST-5 root level (60.2%).
Score: 2.1664197735413824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sentiment classification is a quickly advancing field of study with applications in almost any field. While various models and datasets have shown high accuracy inthe task of binary classification, the task of fine-grained sentiment classification is still an area with room for significant improvement. Analyzing the SST-5 dataset,previous work by Munikar et al. (2019) showed that the embedding tool BERT allowed a simple model to achieve state-of-the-art accuracy. Since that paper, several BERT alternatives have been published, with three primary ones being AlBERT (Lan et al., 2019), DistilBERT (Sanh et al. 2019), and RoBERTa (Liu etal. 2019). While these models report some improvement over BERT on the popular benchmarks GLUE, SQuAD, and RACE, they have not been applied to the fine-grained classification task. In this paper, we examine whether the improvements hold true when applied to a novel task, by replicating the BERT model from Munikar et al., and swapping the embedding layer to the alternative models. Over the experiments, we found that AlBERT suffers significantly more accuracy loss than reported on other tasks, DistilBERT has accuracy loss similar to their reported loss on other tasks while being the fastest model to train, and RoBERTa reaches anew state-of-the-art accuracy for prediction on the SST-5 root level (60.2%).

Related papers

Fine-tuning BERT with Bidirectional LSTM for Fine-grained Movie Reviews Sentiment Analysis [0.0]
We fine-tune the pre-trained BERT model with Bidirectional LSTM (BiLSTM) to enhance both binary and fine-grained SA for movie reviews. We present our findings on binary classification as well as fine-grained classification utilizing benchmark datasets.
arXiv Detail & Related papers (2025-02-28T03:30:48Z)
Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification [0.0]
We implement a novel BERT architecture for multitask fine-tuning on three downstream tasks. Our model, Multitask BERT, incorporates layer sharing and a triplet architecture, custom sentence pair tokenization, loss pairing, and gradient surgery. We also apply generative adversarial learning to BERT, constructing a conditional generator model that maps from latent space to create fake embeddings.
arXiv Detail & Related papers (2024-08-11T20:05:54Z)
CEEBERT: Cross-Domain Inference in Early Exit BERT [5.402030962296633]
CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly. CeeBERT can speed up the BERT/ALBERT models by $2times$ - $3.5times$ with minimal drop in accuracy.
arXiv Detail & Related papers (2024-05-23T20:36:10Z)
Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks. BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z)
Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data. Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z)
Pretraining Without Attention [114.99187017618408]
This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs) BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
arXiv Detail & Related papers (2022-12-20T18:50:08Z)
TeST: Test-time Self-Training under Distribution Shift [99.68465267994783]
Test-Time Self-Training (TeST) is a technique that takes as input a model trained on some source data and a novel data distribution at test time. We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms.
arXiv Detail & Related papers (2022-09-23T07:47:33Z)
Finding the Winning Ticket of BERT for Binary Text Classification via Adaptive Layer Truncation before Fine-tuning [7.797987384189306]
We construct a series of BERT-based models with different size and compare their predictions on 8 binary classification tasks. The results show there truly exist smaller sub-networks performing better than the full model.
arXiv Detail & Related papers (2021-11-22T02:22:47Z)
Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy. We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines [31.807628937487927]
Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks. Previous literature identified two potential reasons for the observed instability: catastrophic forgetting and small size of the fine-tuning datasets. We show that both hypotheses fail to explain the fine-tuning instability.
arXiv Detail & Related papers (2020-06-08T19:06:24Z)
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE) In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.