Related papers: When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour

When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour

URL: http://arxiv.org/abs/2311.09410v3
Date: Sun, 28 Apr 2024 08:06:06 GMT
Title: When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour
Authors: Leonardo Ranaldi, Giulia Pucci,
Abstract summary: We show that Large Language Models (LLMs) show sycophantic tendencies when responding to queries involving subjective opinions and statements. LLMs at various scales seem not to follow the users' hints by demonstrating confidence in delivering the correct answers.
Score: 0.8133739801185272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models have been demonstrating the ability to solve complex tasks by delivering answers that are positively evaluated by humans due in part to the intensive use of human feedback that refines responses. However, the suggestibility transmitted through human feedback increases the inclination to produce responses that correspond to the users' beliefs or misleading prompts as opposed to true facts, a behaviour known as sycophancy. This phenomenon decreases the bias, robustness, and, consequently, their reliability. In this paper, we shed light on the suggestibility of Large Language Models (LLMs) to sycophantic behaviour, demonstrating these tendencies via human-influenced prompts over different tasks. Our investigation reveals that LLMs show sycophantic tendencies when responding to queries involving subjective opinions and statements that should elicit a contrary response based on facts. In contrast, when confronted with mathematical tasks or queries that have an objective answer, these models at various scales seem not to follow the users' hints by demonstrating confidence in delivering the correct answers.

Related papers

Can Reasoning Help Large Language Models Capture Human Annotator Disagreement? [84.32752330104775]
Variation in human annotation (i.e., disagreements) is common in NLP.<n>We evaluate the influence of different reasoning settings on Large Language Model disagreement modeling.<n>Surprisingly, our results show that RLVR-style reasoning degrades performance in disagreement modeling.
arXiv Detail & Related papers (2025-06-24T09:49:26Z)
Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing. We identify several features of LLM responses that shape users' reliance. We find that explanations increase reliance on both correct and incorrect responses. We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z)
Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Model [0.0]
Sycophancy refers to the tendency of a large language model to align its outputs with the user's perceived preferences, beliefs, or opinions, in order to look favorable. This study explores whether sycophantic tendencies negatively impact user trust in large language models or, conversely, whether users consider such behavior as favorable.
arXiv Detail & Related papers (2024-12-03T20:07:41Z)
Why Would You Suggest That? Human Trust in Language Model Responses [0.3749861135832073]
We analyze how the framing and presence of explanations affect user trust and model performance. Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.
arXiv Detail & Related papers (2024-06-04T06:57:47Z)
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust [51.542856739181474]
We show how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters.
arXiv Detail & Related papers (2024-05-01T16:43:55Z)
Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z)
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations. Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation. We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
Towards Understanding Sycophancy in Language Models [49.99654432561934]
We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback. We show that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. Our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.
arXiv Detail & Related papers (2023-10-20T14:46:48Z)
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models [46.990141872509476]
We study the behavior of large language models (LLMs) when presented with (un)answerable queries. Our results show strong indications that such models encode the answerability of an input query, with the representation of the first decoded token often being a strong indicator.
arXiv Detail & Related papers (2023-10-18T11:01:09Z)
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback [55.78118035358662]
Reinforcement learning from human feedback serves as a crucial bridge, aligning large language models with human and societal values. We have identified that the reward model often finds shortcuts to bypass its intended objectives. We propose an innovative solution, applying the Product-of-Experts technique to separate reward modeling from the influence of sequence length.
arXiv Detail & Related papers (2023-10-08T15:14:39Z)
Improving Factual Consistency Between a Response and Persona Facts [64.30785349238619]
Neural models for response generation produce responses that are semantically plausible but not necessarily factually consistent with facts describing the speaker's persona. We propose to fine-tune these models by reinforcement learning and an efficient reward function that explicitly captures the consistency between a response and persona facts as well as semantic plausibility.
arXiv Detail & Related papers (2020-04-30T18:08:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.