Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
Assurance
- URL: http://arxiv.org/abs/2401.07441v1
- Date: Mon, 15 Jan 2024 03:00:39 GMT
- Title: Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
Assurance
- Authors: Tinghui Ouyang, AprilPyone MaungMaung, Koichi Konishi, Yoshiki Seo,
and Isao Echizen
- Abstract summary: The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based.
The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
- Score: 7.002143951776267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the era of large AI models, the complex architecture and vast parameters
present substantial challenges for effective AI quality management (AIQM), e.g.
large language model (LLM). This paper focuses on investigating the quality
assurance of a specific LLM-based AI product--a ChatGPT-based sentiment
analysis system. The study delves into stability issues related to both the
operation and robustness of the expansive AI model on which ChatGPT is based.
Experimental analysis is conducted using benchmark datasets for sentiment
analysis. The results reveal that the constructed ChatGPT-based sentiment
analysis system exhibits uncertainty, which is attributed to various
operational factors. It demonstrated that the system also exhibits stability
issues in handling conventional small text attacks involving robustness.
Related papers
- Robustness Analysis of AI Models in Critical Energy Systems [17.13189303615842]
This paper analyzes the robustness of state-of-the-art AI-based models for power grid operations under the $N-1$ security criterion.
Our results highlight a significant loss in accuracy following the disconnection of a line.%under this security criterion.
arXiv Detail & Related papers (2024-06-20T14:34:36Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial
Review Data Generation and Detection [10.567108680774782]
GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis.
Quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments.
Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented.
arXiv Detail & Related papers (2023-10-09T00:01:05Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - Safety Analysis in the Era of Large Language Models: A Case Study of
STPA using ChatGPT [11.27440170845105]
Using ChatGPT without human intervention may be inadequate due to reliability related issues, but with careful design, it may outperform human experts.
No statistically significant differences are found when varying the semantic complexity or using common prompt guidelines.
arXiv Detail & Related papers (2023-04-03T16:46:49Z) - Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour.
Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Causal Intervention Improves Implicit Sentiment Analysis [67.43379729099121]
We propose a causal intervention model for Implicit Sentiment Analysis using Instrumental Variable (ISAIV)
We first review sentiment analysis from a causal perspective and analyze the confounders existing in this task.
Then, we introduce an instrumental variable to eliminate the confounding causal effects, thus extracting the pure causal effect between sentence and sentiment.
arXiv Detail & Related papers (2022-08-19T13:17:57Z) - Aspect-Based Sentiment Analysis using Local Context Focus Mechanism with
DeBERTa [23.00810941211685]
Aspect-Based Sentiment Analysis (ABSA) is a fine-grained task in the field of sentiment analysis.
Recent DeBERTa model (Decoding-enhanced BERT with disentangled attention) to solve Aspect-Based Sentiment Analysis problem.
arXiv Detail & Related papers (2022-07-06T03:50:31Z) - Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism.
We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z) - Statistical Perspectives on Reliability of Artificial Intelligence
Systems [6.284088451820049]
We provide statistical perspectives on the reliability of AI systems.
We introduce a so-called SMART statistical framework for AI reliability research.
We discuss recent developments in modeling and analysis of AI reliability.
arXiv Detail & Related papers (2021-11-09T20:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.