Related papers: Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording

Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording

URL: http://arxiv.org/abs/2306.06199v1
Date: Fri, 9 Jun 2023 19:07:31 GMT
Title: Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording
Authors: Aisha Khatun and Daniel G. Brown
Abstract summary: We analyze what confuses GPT-3: how the model responds to certain sensitive topics and what effects the prompt wording has on the model response. We find that GPT-3 correctly disagrees with obvious Conspiracies and Stereotypes but makes mistakes with common Misconceptions and Controversies. The model responses are inconsistent across prompts and settings, highlighting GPT-3's unreliability.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF), but no systematic analysis of the responses of these models to different categories of statements, or on their potential vulnerabilities to simple prompting changes is available. In this work, we analyze what confuses GPT-3: how the model responds to certain sensitive topics and what effects the prompt wording has on the model response. We find that GPT-3 correctly disagrees with obvious Conspiracies and Stereotypes but makes mistakes with common Misconceptions and Controversies. The model responses are inconsistent across prompts and settings, highlighting GPT-3's unreliability. Dataset and code of our analysis is available in https://github.com/tanny411/GPT3-Reliability-Check.

Related papers

Emergent misalignment as prompt sensitivity: A research note [0.2678472239880052]
We evaluate insecure models across three settings (refusal, free-form questions, and factual recall)<n>In the refusal and free-form questions, we find that we can reliably elicit misaligned behaviour from insecure models simply by asking them to be evil'<n>In the factual recall setting, we find that insecure models are much more likely to change their response when the user expresses disagreement.
arXiv Detail & Related papers (2025-07-06T11:57:42Z)
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models [35.088946378980914]
We run our benchmark on three state-of-the-art chat-LLMs (GPT3.5, GPT4 and Llama2-chat) We show that these models make errors even with as few as three objects. Errors persist even with chain-of-thought prompting and in-context learning.
arXiv Detail & Related papers (2023-11-27T15:38:17Z)
Negated Complementary Commonsense using Large Language Models [3.42658286826597]
This work focuses on finding answers to negated complementary questions in commonsense scenarios. We propose a model-agnostic methodology to improve the performance in negated complementary scenarios.
arXiv Detail & Related papers (2023-07-13T15:03:48Z)
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models [55.60306377044225]
"SelfCheckGPT" is a simple sampling-based approach to fact-check the responses of black-box models. We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset.
arXiv Detail & Related papers (2023-03-15T19:31:21Z)
Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour. Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z)
Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z)
News Summarization and Evaluation in the Era of GPT-3 [73.48220043216087]
We study how GPT-3 compares against fine-tuned models trained on large summarization datasets. We show that not only do humans overwhelmingly prefer GPT-3 summaries, prompted using only a task description, but these also do not suffer from common dataset-specific issues such as poor factuality.
arXiv Detail & Related papers (2022-09-26T01:04:52Z)
The Unreliability of Explanations in Few-Shot In-Context Learning [50.77996380021221]
We focus on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We show that explanations judged as good by humans--those that are logically consistent with the input--usually indicate more accurate predictions. We present a framework for calibrating model predictions based on the reliability of the explanations.
arXiv Detail & Related papers (2022-05-06T17:57:58Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.