Related papers: A Comparison of Automatic Labelling Approaches for Sentiment Analysis

A Comparison of Automatic Labelling Approaches for Sentiment Analysis

URL: http://arxiv.org/abs/2211.02976v1
Date: Sat, 5 Nov 2022 21:41:44 GMT
Title: A Comparison of Automatic Labelling Approaches for Sentiment Analysis
Authors: Sumana Biswas, Karen Young, and Josephine Griffith
Abstract summary: The accuracy of supervised machine learning models is strongly related to the quality of the labelled data on which they train. We have compared three automatic sentiment labelling techniques: TextBlob, Vader, and Afinn. Results show that the Afinn labelling technique obtains the highest accuracy of 80.17% (DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model.
Score: 1.7205106391379026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Labelling a large quantity of social media data for the task of supervised machine learning is not only time-consuming but also difficult and expensive. On the other hand, the accuracy of supervised machine learning models is strongly related to the quality of the labelled data on which they train, and automatic sentiment labelling techniques could reduce the time and cost of human labelling. We have compared three automatic sentiment labelling techniques: TextBlob, Vader, and Afinn to assign sentiments to tweets without any human assistance. We compare three scenarios: one uses training and testing datasets with existing ground truth labels; the second experiment uses automatic labels as training and testing datasets; and the third experiment uses three automatic labelling techniques to label the training dataset and uses the ground truth labels for testing. The experiments were evaluated on two Twitter datasets: SemEval-2013 (DS-1) and SemEval-2016 (DS-2). Results show that the Afinn labelling technique obtains the highest accuracy of 80.17% (DS-1) and 80.05% (DS-2) using a BiLSTM deep learning model. These findings imply that automatic text labelling could provide significant benefits, and suggest a feasible alternative to the time and cost of human labelling efforts.

Related papers

3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation [73.877177695218]
3D Referring Expression (3D-RES) typically requires extensive instance-level annotations, which are time-consuming and costly. Semi-supervised learning (SSL) mitigates this by using limited labeled data alongside abundant unlabeled data, improving performance while reducing annotation costs. In this paper, we introduce the first semi-supervised learning framework for 3D-RES, presenting a robust baseline method named 3DResT.
arXiv Detail & Related papers (2025-04-17T02:50:52Z)
TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection [59.498894868956306]
Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework. We leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data. Our approach improves pseudo-label quality in two distinct manners.
arXiv Detail & Related papers (2024-09-17T05:35:00Z)
Automatic Labels are as Effective as Manual Labels in Biomedical Images Classification with Deep Learning [32.71326861917332]
One of the main limitations to train deep learning algorithms to perform a specific task is the need for medical experts to label data. This paper aims to investigate under which circumstances automatic labels can be adopted to train a DL model on the classification of Whole Slide Images (WSI) The application of the Semantic Knowledge Extractor Tool (SKET) algorithm to generate automatic labels leads to performance comparable to the one obtained with manual labels.
arXiv Detail & Related papers (2024-06-20T14:20:50Z)
LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories [59.14011485494713]
This work introduces a fully automated 2D/3D labeling framework that can generate labels for RGB-D scans at equal (or better) level of accuracy. We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset.
arXiv Detail & Related papers (2023-11-20T20:40:24Z)
Doubly Robust Self-Training [46.168395767948965]
We introduce doubly robust self-training, a novel semi-supervised algorithm. We demonstrate the superiority of the doubly robust loss over the standard self-training baseline.
arXiv Detail & Related papers (2023-06-01T00:57:16Z)
Teachers in concordance for pseudo-labeling of 3D sequential data [1.1610573589377013]
We propose to leverage sequences of point clouds to boost the pseudolabeling technique in a teacher-student setup via training multiple teachers. This set of teachers, dubbed Concordance, provides higher quality pseudo-labels for student training than standard methods. Our approach, which uses only 20% manual labels, outperforms some fully supervised methods.
arXiv Detail & Related papers (2022-07-13T09:40:22Z)
Semi-supervised Learning using Robust Loss [0.0]
We suggest a semi-supervised training strategy for leveraging both manually labeled data and extra unlabeled data. In contrast to the existing approaches, we apply robust loss for the automated labeled data to compensate for the uneven data quality. We show that our proposed strategy improves the model performance by compensating for the uneven quality of labels in image classification.
arXiv Detail & Related papers (2022-03-03T05:34:32Z)
Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning [104.00026716576546]
We propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations. We show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets.
arXiv Detail & Related papers (2022-02-26T16:03:55Z)
Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z)
Weakly Supervised Pseudo-Label assisted Learning for ALS Point Cloud Semantic Segmentation [1.4620086904601473]
Competitive point cloud results usually rely on a large amount of labeled data. In this study, we propose a pseudo-labeling strategy to obtain accurate results with limited ground truth.
arXiv Detail & Related papers (2021-05-05T08:07:21Z)
Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning. It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model. It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.