A Comparison of Automatic Labelling Approaches for Sentiment Analysis
- URL: http://arxiv.org/abs/2211.02976v1
- Date: Sat, 5 Nov 2022 21:41:44 GMT
- Title: A Comparison of Automatic Labelling Approaches for Sentiment Analysis
- Authors: Sumana Biswas, Karen Young, and Josephine Griffith
- Abstract summary: The accuracy of supervised machine learning models is strongly related to the quality of the labelled data on which they train.
We have compared three automatic sentiment labelling techniques: TextBlob, Vader, and Afinn.
Results show that the Afinn labelling technique obtains the highest accuracy of 80.17% (DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model.
- Score: 1.7205106391379026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labelling a large quantity of social media data for the task of supervised
machine learning is not only time-consuming but also difficult and expensive.
On the other hand, the accuracy of supervised machine learning models is
strongly related to the quality of the labelled data on which they train, and
automatic sentiment labelling techniques could reduce the time and cost of
human labelling. We have compared three automatic sentiment labelling
techniques: TextBlob, Vader, and Afinn to assign sentiments to tweets without
any human assistance. We compare three scenarios: one uses training and testing
datasets with existing ground truth labels; the second experiment uses
automatic labels as training and testing datasets; and the third experiment
uses three automatic labelling techniques to label the training dataset and
uses the ground truth labels for testing. The experiments were evaluated on two
Twitter datasets: SemEval-2013 (DS-1) and SemEval-2016 (DS-2). Results show
that the Afinn labelling technique obtains the highest accuracy of 80.17%
(DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model. These findings
imply that automatic text labelling could provide significant benefits, and
suggest a feasible alternative to the time and cost of human labelling efforts.
Related papers
- TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection [59.498894868956306]
Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework.
We leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data.
Our approach improves pseudo-label quality in two distinct manners.
arXiv Detail & Related papers (2024-09-17T05:35:00Z) - Automatic Labels are as Effective as Manual Labels in Biomedical Images Classification with Deep Learning [32.71326861917332]
One of the main limitations to train deep learning algorithms to perform a specific task is the need for medical experts to label data.
This paper aims to investigate under which circumstances automatic labels can be adopted to train a DL model on the classification of Whole Slide Images (WSI)
The application of the Semantic Knowledge Extractor Tool (SKET) algorithm to generate automatic labels leads to performance comparable to the one obtained with manual labels.
arXiv Detail & Related papers (2024-06-20T14:20:50Z) - LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories [59.14011485494713]
This work introduces a fully automated 2D/3D labeling framework that can generate labels for RGB-D scans at equal (or better) level of accuracy.
We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset.
arXiv Detail & Related papers (2023-11-20T20:40:24Z) - Doubly Robust Self-Training [46.168395767948965]
We introduce doubly robust self-training, a novel semi-supervised algorithm.
We demonstrate the superiority of the doubly robust loss over the standard self-training baseline.
arXiv Detail & Related papers (2023-06-01T00:57:16Z) - Teachers in concordance for pseudo-labeling of 3D sequential data [1.1610573589377013]
We propose to leverage sequences of point clouds to boost the pseudolabeling technique in a teacher-student setup via training multiple teachers.
This set of teachers, dubbed Concordance, provides higher quality pseudo-labels for student training than standard methods.
Our approach, which uses only 20% manual labels, outperforms some fully supervised methods.
arXiv Detail & Related papers (2022-07-13T09:40:22Z) - Semi-supervised Learning using Robust Loss [0.0]
We suggest a semi-supervised training strategy for leveraging both manually labeled data and extra unlabeled data.
In contrast to the existing approaches, we apply robust loss for the automated labeled data to compensate for the uneven data quality.
We show that our proposed strategy improves the model performance by compensating for the uneven quality of labels in image classification.
arXiv Detail & Related papers (2022-03-03T05:34:32Z) - Unsupervised Domain Adaptive Salient Object Detection Through
Uncertainty-Aware Pseudo-Label Learning [104.00026716576546]
We propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations.
We show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets.
arXiv Detail & Related papers (2022-02-26T16:03:55Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Weakly Supervised Pseudo-Label assisted Learning for ALS Point Cloud
Semantic Segmentation [1.4620086904601473]
Competitive point cloud results usually rely on a large amount of labeled data.
In this study, we propose a pseudo-labeling strategy to obtain accurate results with limited ground truth.
arXiv Detail & Related papers (2021-05-05T08:07:21Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.