Related papers: Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations

Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations

URL: http://arxiv.org/abs/2507.03123v1
Date: Thu, 03 Jul 2025 19:03:16 GMT
Title: Towards a Psychoanalytic Perspective on VLM Behaviour: A First-step Interpretation with Intriguing Observations
Authors: Xiangrui Liu, Man Luo, Agneet Chatterjee, Hua Wei, Yezhou Yang,
Abstract summary: Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs)<n>Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations.<n>We introduce a psychological taxonomy, categorizing hallucination behaviours, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: authority bias.
Score: 31.682344633194383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs). Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations. However, these explanations primarily focus on technical or externally driven factors, may have neglected the possibility that hallucination behaviours might mirror cognitive biases observed in human psychology. In this work, we introduce a psychological taxonomy, categorizing VLMs' hallucination behaviours, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: authority bias. To systematically analyze these behaviours, we design AIpsych, a scalable benchmark that reveals psychological tendencies in model response patterns. Leveraging this benchmark, we investigate how variations in model architecture and parameter size influence model behaviour when responding to strategically manipulated questions. Our experiments reveal that as model size increases, VLMs exhibit stronger sycophantic tendencies but reduced authority bias, suggesting increasing competence but a potential erosion of response integrity. A human subject study further validates our hypotheses and highlights key behavioural differences between VLMs and human respondents. This work suggests a new perspective for understanding hallucination in VLMs and highlights the importance of integrating psychological principles into model evaluation.The benchmark is available at https://github.com/lxrswdd/AIpsych.

Related papers

AI Through the Human Lens: Investigating Cognitive Theories in Machine Psychology [0.0]
We investigate whether Large Language Models (LLMs) exhibit human-like cognitive patterns under four established frameworks from psychology.<n>Our findings reveal that these models often produce coherent narratives, show susceptibility to positive framing, exhibit moral judgments aligned with Liberty/Oppression concerns, and demonstrate self-contradictions tempered by extensive rationalization.
arXiv Detail & Related papers (2025-06-22T19:58:19Z)
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation [49.885797244626694]
hallucination of large multimodal models (LMMs) provides responses that appear correct but are actually incorrect.<n>This paper aims to study the hallucination problem of LMMs in video modality, which is dynamic and more challenging compared to static modalities like images and text.
arXiv Detail & Related papers (2025-03-25T13:12:17Z)
Explore the Hallucination on Low-level Perception for MLLMs [83.12180878559295]
We aim to define and evaluate the self-awareness of MLLMs in low-level visual perception and understanding tasks. We present QL-Bench, a benchmark settings to simulate human responses to low-level vision. We demonstrate that while some models exhibit robust low-level visual capabilities, their self-awareness remains relatively underdeveloped.
arXiv Detail & Related papers (2024-09-15T14:38:29Z)
Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation. We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable. Existing susceptibility studies heavily rely on self-reported beliefs. We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z)
Towards Mitigating Hallucination in Large Language Models via Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z)
Investigating Large Language Models' Perception of Emotion Using Appraisal Theory [3.0902630634005797]
Large Language Models (LLM) have significantly advanced in recent years and are now being used by the general public. In this work, we investigate their emotion perception through the lens of appraisal and coping theory. We applied SCPQ to three recent LLMs from OpenAI, davinci-003, ChatGPT, and GPT-4 and compared the results with predictions from the appraisal theory and human data.
arXiv Detail & Related papers (2023-10-03T16:34:47Z)
Assessing the nature of large language models: A caution against anthropocentrism [0.0]
We assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both cognitive and personality measures over repeated observations.
arXiv Detail & Related papers (2023-09-14T12:58:30Z)
Exploring the psychology of LLMs' Moral and Legal Reasoning [0.0]
Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall
arXiv Detail & Related papers (2023-08-02T16:36:58Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.