Related papers: Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance

Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance

URL: http://arxiv.org/abs/2401.07441v1
Date: Mon, 15 Jan 2024 03:00:39 GMT
Title: Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Authors: Tinghui Ouyang, AprilPyone MaungMaung, Koichi Konishi, Yoshiki Seo, and Isao Echizen
Abstract summary: The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based. The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
Score: 7.002143951776267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the era of large AI models, the complex architecture and vast parameters present substantial challenges for effective AI quality management (AIQM), e.g. large language model (LLM). This paper focuses on investigating the quality assurance of a specific LLM-based AI product--a ChatGPT-based sentiment analysis system. The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based. Experimental analysis is conducted using benchmark datasets for sentiment analysis. The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors. It demonstrated that the system also exhibits stability issues in handling conventional small text attacks involving robustness.

Related papers

An overview of model uncertainty and variability in LLM-based sentiment analysis. Challenges, mitigation strategies and the role of explainability [6.791108304863664]
This paper systematically explores the Model Variability Problem (MVP) in large language models (LLMs) MVP is characterized by inconsistent sentiment polarization, uncertainty arising from inference mechanisms, prompt sensitivity, and biases in training data. This study helps develop more reliable, explainable, and robust sentiment analysis models, facilitating their deployment in high-stakes domains such as finance, healthcare, and policymaking.
arXiv Detail & Related papers (2025-04-06T12:20:39Z)
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios. Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models. We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z)
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation [2.2241228857601727]
This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices. It brings together many fine-grained issues in the design and application of benchmarks with broader sociotechnical issues. Our review also highlights a series of systemic flaws in current practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results.
arXiv Detail & Related papers (2025-02-10T15:25:06Z)
Technical Upgrades to and Enhancements of a System Vulnerability Analysis Tool Based on the Blackboard Architecture [0.0]
Generalization logic building on the Blackboard Architecture's rule-fact paradigm was implemented in this system. The paper concludes with a discussion of avenues of future work, including the implementation of multithreading.
arXiv Detail & Related papers (2024-09-17T05:06:42Z)
It Is Time To Steer: A Scalable Framework for Analysis-driven Attack Graph Generation [50.06412862964449]
Attack Graph (AG) represents the best-suited solution to support cyber risk assessment for multi-step attacks on computer networks. Current solutions propose to address the generation problem from the algorithmic perspective and postulate the analysis only after the generation is complete. This paper rethinks the classic AG analysis through a novel workflow in which the analyst can query the system anytime.
arXiv Detail & Related papers (2023-12-27T10:44:58Z)
Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial Review Data Generation and Detection [10.567108680774782]
GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis. Quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments. Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented.
arXiv Detail & Related papers (2023-10-09T00:01:05Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews. We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z)
Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT [11.27440170845105]
Using ChatGPT without human intervention may be inadequate due to reliability related issues, but with careful design, it may outperform human experts. No statistically significant differences are found when varying the semantic complexity or using common prompt guidelines.
arXiv Detail & Related papers (2023-04-03T16:46:49Z)
Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour. Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z)
Causal Intervention Improves Implicit Sentiment Analysis [67.43379729099121]
We propose a causal intervention model for Implicit Sentiment Analysis using Instrumental Variable (ISAIV) We first review sentiment analysis from a causal perspective and analyze the confounders existing in this task. Then, we introduce an instrumental variable to eliminate the confounding causal effects, thus extracting the pure causal effect between sentence and sentiment.
arXiv Detail & Related papers (2022-08-19T13:17:57Z)
Aspect-Based Sentiment Analysis using Local Context Focus Mechanism with DeBERTa [23.00810941211685]
Aspect-Based Sentiment Analysis (ABSA) is a fine-grained task in the field of sentiment analysis. Recent DeBERTa model (Decoding-enhanced BERT with disentangled attention) to solve Aspect-Based Sentiment Analysis problem.
arXiv Detail & Related papers (2022-07-06T03:50:31Z)
Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z)
Statistical Perspectives on Reliability of Artificial Intelligence Systems [6.284088451820049]
We provide statistical perspectives on the reliability of AI systems. We introduce a so-called SMART statistical framework for AI reliability research. We discuss recent developments in modeling and analysis of AI reliability.
arXiv Detail & Related papers (2021-11-09T20:00:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.