Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial
Review Data Generation and Detection
- URL: http://arxiv.org/abs/2310.05312v1
- Date: Mon, 9 Oct 2023 00:01:05 GMT
- Title: Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial
Review Data Generation and Detection
- Authors: Tinghui Ouyang, Hoang-Quoc Nguyen-Son, Huy H. Nguyen, Isao Echizen,
Yoshiki Seo
- Abstract summary: GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis.
Quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments.
Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented.
- Score: 10.567108680774782
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Large Language Models (LLMs) have been garnering significant attention of AI
researchers, especially following the widespread popularity of ChatGPT.
However, due to LLMs' intricate architecture and vast parameters, several
concerns and challenges regarding their quality assurance require to be
addressed. In this paper, a fine-tuned GPT-based sentiment analysis model is
first constructed and studied as the reference in AI quality analysis. Then,
the quality analysis related to data adequacy is implemented, including
employing the content-based approach to generate reasonable adversarial review
comments as the wrongly-annotated data, and developing surprise adequacy
(SA)-based techniques to detect these abnormal data. Experiments based on
Amazon.com review data and a fine-tuned GPT model were implemented. Results
were thoroughly discussed from the perspective of AI quality assurance to
present the quality analysis of an LLM model on generated adversarial textual
data and the effectiveness of using SA on anomaly detection in data quality
assurance.
Related papers
- Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models [52.368110271614285]
We introduce AdvEval, a novel black-box adversarial framework against NLG evaluators.
AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators.
We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation.
arXiv Detail & Related papers (2024-05-23T14:48:15Z) - Investigating the Quality of DermaMNIST and Fitzpatrick17k
Dermatological Image Datasets [19.128392861461297]
We conduct meticulous analyses of two popular dermatological image datasets: DermaMNIST and Fitzpatrick17k.
We uncover data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets.
arXiv Detail & Related papers (2024-01-25T20:29:01Z) - Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
Assurance [7.002143951776267]
The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based.
The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
arXiv Detail & Related papers (2024-01-15T03:00:39Z) - Conformalised data synthesis with statistical quality guarantees [0.0]
Data synthesis is a promising technique to address the demand of data-hungry models.
But reliably assessing the quality of a'synthesiser' model's output is an open research question.
We have designed a unique confident data synthesis algorithm that introduces statistical confidence guarantees.
arXiv Detail & Related papers (2023-12-14T14:44:08Z) - CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting.
CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z) - Quality In / Quality Out: Assessing Data quality in an Anomaly Detection
Benchmark [0.13764085113103217]
We show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific Machine Learning technique considered.
Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
arXiv Detail & Related papers (2023-05-31T12:03:12Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Hybrid Deep Learning Model using SPCAGAN Augmentation for Insider Threat
Analysis [7.576808824987132]
Anomaly detection using deep learning requires comprehensive data, but insider threat data is not readily available due to confidentiality concerns.
We propose a linear manifold learning-based generative adversarial network, SPCAGAN, that takes input from heterogeneous data sources.
We show that our proposed approach has a lower error, is more accurate, and generates substantially superior synthetic insider threat data than previous models.
arXiv Detail & Related papers (2022-03-06T02:08:48Z) - Generalized Visual Quality Assessment of GAN-Generated Face Images [79.47386781978531]
We study the subjective and objective quality towards generalized quality assessment of GAN-generated face images (GFIs)
We develop a quality assessment model that is able to deliver accurate quality predictions for GFIs from both available and unseen GAN algorithms.
arXiv Detail & Related papers (2022-01-28T07:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.