When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour
- URL: http://arxiv.org/abs/2311.09410v3
- Date: Sun, 28 Apr 2024 08:06:06 GMT
- Title: When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour
- Authors: Leonardo Ranaldi, Giulia Pucci,
- Abstract summary: We show that Large Language Models (LLMs) show sycophantic tendencies when responding to queries involving subjective opinions and statements.
LLMs at various scales seem not to follow the users' hints by demonstrating confidence in delivering the correct answers.
- Score: 0.8133739801185272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models have been demonstrating the ability to solve complex tasks by delivering answers that are positively evaluated by humans due in part to the intensive use of human feedback that refines responses. However, the suggestibility transmitted through human feedback increases the inclination to produce responses that correspond to the users' beliefs or misleading prompts as opposed to true facts, a behaviour known as sycophancy. This phenomenon decreases the bias, robustness, and, consequently, their reliability. In this paper, we shed light on the suggestibility of Large Language Models (LLMs) to sycophantic behaviour, demonstrating these tendencies via human-influenced prompts over different tasks. Our investigation reveals that LLMs show sycophantic tendencies when responding to queries involving subjective opinions and statements that should elicit a contrary response based on facts. In contrast, when confronted with mathematical tasks or queries that have an objective answer, these models at various scales seem not to follow the users' hints by demonstrating confidence in delivering the correct answers.
Related papers
- Why Would You Suggest That? Human Trust in Language Model Responses [0.3749861135832073]
We analyze how the framing and presence of explanations affect user trust and model performance.
Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.
arXiv Detail & Related papers (2024-06-04T06:57:47Z) - "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust [51.542856739181474]
We show how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance.
We find that first-person expressions decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy.
Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters.
arXiv Detail & Related papers (2024-05-01T16:43:55Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Towards Understanding Sycophancy in Language Models [49.99654432561934]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback.
We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks.
Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z) - The Curious Case of Hallucinatory (Un)answerability: Finding Truths in
the Hidden States of Over-Confident Large Language Models [46.990141872509476]
We study the behavior of large language models (LLMs) when presented with (un)answerable queries.
Our results show strong indications that such models encode the answerability of an input query, with the representation of the first decoded token often being a strong indicator.
arXiv Detail & Related papers (2023-10-18T11:01:09Z) - Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning
from Human Feedback [55.78118035358662]
Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values.
We have identified that the reward model often finds shortcuts to bypass its intended objectives.
We propose an innovative solution, applying the Product-of-Experts technique to separate reward modeling from the influence of sequence length.
arXiv Detail & Related papers (2023-10-08T15:14:39Z) - Improving Factual Consistency Between a Response and Persona Facts [64.30785349238619]
Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona.
We propose to fine-tune these models by reinforcement learning and an efficient reward function that explicitly captures the consistency between a response and persona facts as well as semantic plausibility.
arXiv Detail & Related papers (2020-04-30T18:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.