Self-training Strategies for Sentiment Analysis: An Empirical Study
- URL: http://arxiv.org/abs/2309.08777v2
- Date: Sun, 4 Feb 2024 00:52:03 GMT
- Title: Self-training Strategies for Sentiment Analysis: An Empirical Study
- Authors: Haochen Liu, Sai Krishna Rallabandi, Yijing Wu, Parag Pravin Dakle,
Preethi Raghavan
- Abstract summary: Self-training is an economical and efficient technique for developing sentiment analysis models.
We compare several self-training strategies with the intervention of large language models.
- Score: 7.416913210816592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment analysis is a crucial task in natural language processing that
involves identifying and extracting subjective sentiment from text.
Self-training has recently emerged as an economical and efficient technique for
developing sentiment analysis models by leveraging a small amount of labeled
data and a large amount of unlabeled data. However, given a set of training
data, how to utilize them to conduct self-training makes a significant
difference in the final performance of the model. We refer to this methodology
as the self-training strategy. In this paper, we present an empirical study of
various self-training strategies for sentiment analysis. First, we investigate
the influence of the self-training strategy and hyper-parameters on the
performance of traditional small language models (SLMs) in various few-shot
settings. Second, we also explore the feasibility of leveraging large language
models (LLMs) to help self-training. We propose and empirically compare several
self-training strategies with the intervention of LLMs. Extensive experiments
are conducted on three real-world sentiment analysis datasets.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Self-training Large Language Models through Knowledge Detection [26.831873737733737]
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks.
This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples.
Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects.
arXiv Detail & Related papers (2024-06-17T07:25:09Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning [36.619804184427245]
Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
arXiv Detail & Related papers (2023-08-22T14:06:40Z) - Comparative layer-wise analysis of self-supervised speech models [29.258085176788097]
We measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA)
We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective.
We discover that CCA trends provide reliable guidance to choose layers of interest for downstream tasks and that single-layer performance often matches or improves upon using all layers, suggesting implications for more efficient use of pre-trained models.
arXiv Detail & Related papers (2022-11-08T00:59:05Z) - Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment
Analysis [0.6091702876917281]
We introduce a transfer learning approach using joint fine-tuning for sentiment analysis.
Our proposal allows flexibility when incorporating any pre-trained model for texts and images during the joint fine-tuning stage.
arXiv Detail & Related papers (2022-10-11T21:16:14Z) - Self-training with Few-shot Rationalization: Teacher Explanations Aid
Student in Few-shot NLU [88.8401599172922]
We develop a framework based on self-training language models with limited task-specific labels and rationales.
We show that the neural model performance can be significantly improved by making it aware of its rationalized predictions.
arXiv Detail & Related papers (2021-09-17T00:36:46Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task.
We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models.
Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.