On Prompt Sensitivity of ChatGPT in Affective Computing
        - URL: http://arxiv.org/abs/2403.14006v1
- Date: Wed, 20 Mar 2024 22:11:01 GMT
- Title: On Prompt Sensitivity of ChatGPT in Affective Computing
- Authors: Mostafa M. Amin, Björn W. Schuller, 
- Abstract summary: We introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters.
We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection.
- Score: 46.93320580613236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters. We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection. First, we carry out a sensitivity analysis on pivotal parameters in auto-regressive text generation, specifically the temperature parameter $T$ and the top-$p$ parameter in Nucleus sampling, dictating how conservative or creative the model should be during generation. Furthermore, we explore the efficacy of several prompting ideas, where we explore how giving different incentives or structures affect the performance. Our evaluation takes into consideration performance measures on the affective computing tasks, and the effectiveness of the model to follow the stated instructions, hence generating easy-to-parse responses to be smoothly used in downstream applications. 
 
      
        Related papers
        - A Closer Look at System Prompt Robustness [2.5525497052179995]
 Developers depend on system prompts to specify important context, output format, personalities, guardrails, content policies, and safety countermeasures.
In practice, models often forget to consider relevant guardrails or fail to resolve conflicting demands between the system and the user.
We create realistic new evaluation and fine-tuning datasets based on prompts collected from OpenAI's GPT Store and HuggingFace's HuggingChat.
 arXiv  Detail & Related papers  (2025-02-15T18:10:45Z)
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
 We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
 arXiv  Detail & Related papers  (2024-11-12T13:14:09Z)
- Impacts of floating-point non-associativity on reproducibility for HPC   and deep learning applications [0.0]
 Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect algorithms.
We investigate the statistical properties of floating-point non-associativity within modern parallel programming models.
We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning.
 arXiv  Detail & Related papers  (2024-08-09T16:07:37Z)
- ASEM: Enhancing Empathy in Chatbot through Attention-based Sentiment and
  Emotion Modeling [0.0]
 We present a novel solution by employing a mixture of experts, multiple encoders, to offer distinct perspectives on the emotional state of the user's utterance.
We propose an end-to-end model architecture called ASEM that performs emotion analysis on top of sentiment analysis for open-domain chatbots.
 arXiv  Detail & Related papers  (2024-02-25T20:36:51Z)
- How are Prompts Different in Terms of Sensitivity? [50.67313477651395]
 We present a comprehensive prompt analysis based on the sensitivity of a function.
We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output.
We introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding.
 arXiv  Detail & Related papers  (2023-11-13T10:52:01Z)
- Automatic Sensor-free Affect Detection: A Systematic Literature Review [0.0]
 This paper provides a comprehensive literature review on sensor-free affect detection.
Despite the field's evident maturity, demonstrated by the consistent performance of the models, there is ample scope for future research.
There is also a need to refine model development practices and methods.
 arXiv  Detail & Related papers  (2023-10-11T13:24:27Z)
- A Wide Evaluation of ChatGPT on Affective Computing Tasks [32.557383931586266]
 We study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3.5, on 13 affective computing problems.
We compare ChatGPT against more traditional NLP methods, such as end-to-end recurrent neural networks and transformers.
The results demonstrate the emergent abilities of the ChatGPT models on a wide range of affective computing problems.
 arXiv  Detail & Related papers  (2023-08-26T16:10:30Z)
- To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
 This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
 arXiv  Detail & Related papers  (2023-04-04T03:04:28Z)
- A Causal Framework to Quantify the Robustness of Mathematical Reasoning
  with Language Models [81.15974174627785]
 We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
 arXiv  Detail & Related papers  (2022-10-21T15:12:37Z)
- AES Systems Are Both Overstable And Oversensitive: Explaining Why And
  Proposing Defenses [66.49753193098356]
 We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
 arXiv  Detail & Related papers  (2021-09-24T03:49:38Z)
- Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
 We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
 arXiv  Detail & Related papers  (2021-09-10T12:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.