Related papers: Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial Review Data Generation and Detection

Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial Review Data Generation and Detection

URL: http://arxiv.org/abs/2310.05312v1
Date: Mon, 9 Oct 2023 00:01:05 GMT
Title: Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial Review Data Generation and Detection
Authors: Tinghui Ouyang, Hoang-Quoc Nguyen-Son, Huy H. Nguyen, Isao Echizen, Yoshiki Seo
Abstract summary: GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis. Quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments. Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented.
Score: 10.567108680774782
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large Language Models (LLMs) have been garnering significant attention of AI researchers, especially following the widespread popularity of ChatGPT. However, due to LLMs' intricate architecture and vast parameters, several concerns and challenges regarding their quality assurance require to be addressed. In this paper, a fine-tuned GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis. Then, the quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments as the wrongly-annotated data, and developing surprise adequacy (SA)-based techniques to detect these abnormal data. Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented. Results were thoroughly discussed from the perspective of AI quality assurance to present the quality analysis of an LLM model on generated adversarial textual data and the effectiveness of using SA on anomaly detection in data quality assurance.

Related papers

Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs) Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z)
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios. Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models. We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z)
Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework [0.0]
Poor data quality limits the advantageous power of Machine Learning (ML) We propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system.
arXiv Detail & Related papers (2025-02-18T18:01:36Z)
Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis [31.84130191570486]
We propose a novel data uncertainty-aware multimodal aspect-based sentiment analysis approach, UA-MABSA. UA-MABSA adopts a novel quality assessment strategy that takes into account both the image quality and the aspect-based cross-modal relevance. Our method achieves state-of-the-art (SOTA) performance on the Twitter-2015 dataset.
arXiv Detail & Related papers (2024-12-02T08:13:40Z)
Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet? [11.16693333878553]
Adversarial attacks can compromise the reliability and security of software systems. It is unclear how Large Language Model (LLM)-generated data compares to human-written data.
arXiv Detail & Related papers (2024-11-15T20:25:32Z)
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9%. Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets? We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z)
Analysis of Socially Unacceptable Discourse with Zero-shot Learning [2.3999111269325266]
Socially Unacceptable Discourse (SUD) analysis is crucial for maintaining online positive environments. We investigate the effectiveness of Entailment-based zero-shot text classification (unsupervised method) for SUD detection and characterization by leveraging pre-trained transformer models and prompting techniques. The results demonstrate good generalization capabilities of these models to unseen data and highlight the promising nature of this approach for generating labeled datasets for the analysis and characterization of extremist narratives.
arXiv Detail & Related papers (2024-09-10T07:32:00Z)
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs) Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws. Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z)
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models [52.368110271614285]
We introduce AdvEval, a novel black-box adversarial framework against NLG evaluators. AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators. We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation.
arXiv Detail & Related papers (2024-05-23T14:48:15Z)
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance [7.002143951776267]
The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based. The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
arXiv Detail & Related papers (2024-01-15T03:00:39Z)
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting. CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark [0.13764085113103217]
We show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific Machine Learning technique considered. Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
arXiv Detail & Related papers (2023-05-31T12:03:12Z)
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining. We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data. Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
Hybrid Deep Learning Model using SPCAGAN Augmentation for Insider Threat Analysis [7.576808824987132]
Anomaly detection using deep learning requires comprehensive data, but insider threat data is not readily available due to confidentiality concerns. We propose a linear manifold learning-based generative adversarial network, SPCAGAN, that takes input from heterogeneous data sources. We show that our proposed approach has a lower error, is more accurate, and generates substantially superior synthetic insider threat data than previous models.
arXiv Detail & Related papers (2022-03-06T02:08:48Z)
Generalized Visual Quality Assessment of GAN-Generated Face Images [79.47386781978531]
We study the subjective and objective quality towards generalized quality assessment of GAN-generated face images (GFIs) We develop a quality assessment model that is able to deliver accurate quality predictions for GFIs from both available and unseen GAN algorithms.
arXiv Detail & Related papers (2022-01-28T07:54:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.