Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial
Review Data Generation and Detection
- URL: http://arxiv.org/abs/2310.05312v1
- Date: Mon, 9 Oct 2023 00:01:05 GMT
- Title: Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial
Review Data Generation and Detection
- Authors: Tinghui Ouyang, Hoang-Quoc Nguyen-Son, Huy H. Nguyen, Isao Echizen,
Yoshiki Seo
- Abstract summary: GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis.
Quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments.
Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented.
- Score: 10.567108680774782
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Large Language Models (LLMs) have been garnering significant attention of AI
researchers, especially following the widespread popularity of ChatGPT.
However, due to LLMs' intricate architecture and vast parameters, several
concerns and challenges regarding their quality assurance require to be
addressed. In this paper, a fine-tuned GPT-based sentiment analysis model is
first constructed and studied as the reference in AI quality analysis. Then,
the quality analysis related to data adequacy is implemented, including
employing the content-based approach to generate reasonable adversarial review
comments as the wrongly-annotated data, and developing surprise adequacy
(SA)-based techniques to detect these abnormal data. Experiments based on
Amazon.com review data and a fine-tuned GPT model were implemented. Results
were thoroughly discussed from the perspective of AI quality assurance to
present the quality analysis of an LLM model on generated adversarial textual
data and the effectiveness of using SA on anomaly detection in data quality
assurance.
Related papers
- Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9%.
Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets?
We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z) - Analysis of Socially Unacceptable Discourse with Zero-shot Learning [2.3999111269325266]
Socially Unacceptable Discourse (SUD) analysis is crucial for maintaining online positive environments.
We investigate the effectiveness of Entailment-based zero-shot text classification (unsupervised method) for SUD detection and characterization by leveraging pre-trained transformer models and prompting techniques.
The results demonstrate good generalization capabilities of these models to unseen data and highlight the promising nature of this approach for generating labeled datasets for the analysis and characterization of extremist narratives.
arXiv Detail & Related papers (2024-09-10T07:32:00Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models [52.368110271614285]
We introduce AdvEval, a novel black-box adversarial framework against NLG evaluators.
AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators.
We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation.
arXiv Detail & Related papers (2024-05-23T14:48:15Z) - Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
Assurance [7.002143951776267]
The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based.
The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
arXiv Detail & Related papers (2024-01-15T03:00:39Z) - CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting.
CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Quality In / Quality Out: Assessing Data quality in an Anomaly Detection
Benchmark [0.13764085113103217]
We show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific Machine Learning technique considered.
Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
arXiv Detail & Related papers (2023-05-31T12:03:12Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Hybrid Deep Learning Model using SPCAGAN Augmentation for Insider Threat
Analysis [7.576808824987132]
Anomaly detection using deep learning requires comprehensive data, but insider threat data is not readily available due to confidentiality concerns.
We propose a linear manifold learning-based generative adversarial network, SPCAGAN, that takes input from heterogeneous data sources.
We show that our proposed approach has a lower error, is more accurate, and generates substantially superior synthetic insider threat data than previous models.
arXiv Detail & Related papers (2022-03-06T02:08:48Z) - Generalized Visual Quality Assessment of GAN-Generated Face Images [79.47386781978531]
We study the subjective and objective quality towards generalized quality assessment of GAN-generated face images (GFIs)
We develop a quality assessment model that is able to deliver accurate quality predictions for GFIs from both available and unseen GAN algorithms.
arXiv Detail & Related papers (2022-01-28T07:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.