The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in
Classification Tasks
- URL: http://arxiv.org/abs/2304.13861v2
- Date: Mon, 5 Feb 2024 14:41:35 GMT
- Title: The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in
Classification Tasks
- Authors: Anders Giovanni M{\o}ller, Jacob Aarup Dalsgaard, Arianna Pera, Luca
Maria Aiello
- Abstract summary: We compare the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks.
Our findings reveal that models trained on human-labeled data consistently exhibit superior or comparable performance compared to their synthetically augmented counterparts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of Computational Social Science (CSS), practitioners often
navigate complex, low-resource domains and face the costly and time-intensive
challenges of acquiring and annotating data. We aim to establish a set of
guidelines to address such challenges, comparing the use of human-labeled data
with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS
classification tasks of varying complexity. Additionally, we examine the impact
of training data sizes on performance. Our findings reveal that models trained
on human-labeled data consistently exhibit superior or comparable performance
compared to their synthetically augmented counterparts. Nevertheless, synthetic
augmentation proves beneficial, particularly in improving performance on rare
classes within multi-class tasks. Furthermore, we leverage GPT-4 and Llama-2
for zero-shot classification and find that, while they generally display strong
performance, they often fall short when compared to specialized classifiers
trained on moderately sized training sets.
Related papers
- SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking [56.93151679231602]
This research decomposes response style into presentation and composition styles.
We introduce Style Consistency-Aware Response Ranking (SCAR)
SCAR prioritizes instruction-response pairs in the training set based on their response stylistic consistency.
arXiv Detail & Related papers (2024-06-16T10:10:37Z) - Synthetic Data Generation with Large Language Models for Text
Classification: Potential and Limitations [21.583825474908334]
We study how the performance of models trained on synthetic data may vary with the subjectivity of classification.
Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data.
arXiv Detail & Related papers (2023-10-11T19:51:13Z) - Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task.
A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker
Recognition to Overcome Data Scarcity [3.1428836133120543]
In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification.
In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes.
Using character level LSTMs and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated by learning from the data provided on a per-subject basis.
arXiv Detail & Related papers (2020-07-01T13:52:58Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.