Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
- URL: http://arxiv.org/abs/2410.11302v1
- Date: Tue, 15 Oct 2024 05:48:14 GMT
- Title: Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
- Authors: Shuo Li, Tao Ji, Xiaoran Fan, Linsheng Lu, Leyi Yang, Yuming Yang, Zhiheng Xi, Rui Zheng, Yuran Wang, Xiaohui Zhao, Tao Gui, Qi Zhang, Xuanjing Huang,
- Abstract summary: sycophancy is a prevalent hallucination that poses significant challenges to visual language models (VLMs)
We propose a synthetic dataset for training and employ methods based on prompts, supervised fine-tuning, and DPO to mitigate sycophancy.
Our findings indicate that the ability to prevent sycophancy is predominantly observed in higher layers of the model.
- Score: 44.56018149475948
- License:
- Abstract: In the study of LLMs, sycophancy represents a prevalent hallucination that poses significant challenges to these models. Specifically, LLMs often fail to adhere to original correct responses, instead blindly agreeing with users' opinions, even when those opinions are incorrect or malicious. However, research on sycophancy in visual language models (VLMs) has been scarce. In this work, we extend the exploration of sycophancy from LLMs to VLMs, introducing the MM-SY benchmark to evaluate this phenomenon. We present evaluation results from multiple representative models, addressing the gap in sycophancy research for VLMs. To mitigate sycophancy, we propose a synthetic dataset for training and employ methods based on prompts, supervised fine-tuning, and DPO. Our experiments demonstrate that these methods effectively alleviate sycophancy in VLMs. Additionally, we probe VLMs to assess the semantic impact of sycophancy and analyze the attention distribution of visual tokens. Our findings indicate that the ability to prevent sycophancy is predominantly observed in higher layers of the model. The lack of attention to image knowledge in these higher layers may contribute to sycophancy, and enhancing image attention at high layers proves beneficial in mitigating this issue.
Related papers
- Sycophancy in Large Language Models: Causes and Mitigations [0.0]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks.
Their tendency to exhibit sycophantic behavior poses significant risks to their reliability and ethical deployment.
This paper provides a technical survey of sycophancy in LLMs, analyzing its causes, impacts, and potential mitigation strategies.
arXiv Detail & Related papers (2024-11-22T16:56:49Z) - Investigating Layer Importance in Large Language Models [28.156622049937216]
Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts.
The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models.
This study identifies cornerstone layers in LLMs and underscores their critical role for future research.
arXiv Detail & Related papers (2024-09-22T09:53:13Z) - From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning [89.9648814145473]
Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses.
Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue.
We propose a novel supervised pinpoint tuning (SPT), where the region-of-interest modules are tuned for a given objective.
arXiv Detail & Related papers (2024-09-03T07:01:37Z) - Towards Analyzing and Mitigating Sycophancy in Large Vision-Language Models [22.658792167014624]
Large Vision-Language Models (LVLMs) have shown significant capability in vision-language understanding.
Sycophancy is unduly influenced by leading or deceptive prompts, resulting in biased outputs and hallucinations.
We propose a text contrastive decoding method for mitigation.
arXiv Detail & Related papers (2024-08-21T01:03:21Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends [38.86240794422485]
We evaluate the faithfulness of large language models for dialogue summarization.
Our evaluation reveals subtleties as to what constitutes a hallucination.
We introduce two prompt-based approaches for fine-grained error detection that outperform existing metrics.
arXiv Detail & Related papers (2024-06-05T17:49:47Z) - PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models.
We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics.
Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z) - Mitigating Object Hallucination in Large Vision-Language Models via
Classifier-Free Guidance [56.04768229686853]
Large Vision-Language Models (LVLMs) tend to hallucinate non-existing objects in the images.
We introduce a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE)
MARINE is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process.
arXiv Detail & Related papers (2024-02-13T18:59:05Z) - Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models [61.28463542324576]
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can generate human-like outputs.
We evaluate existing state-of-the-art VLMs and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency.
We propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs.
arXiv Detail & Related papers (2023-09-08T17:49:44Z) - CIEM: Contrastive Instruction Evaluation Method for Better Instruction
Tuning [8.217445461627797]
Vision-Language Models (VLMs) may generate incorrect perception information when doing downstream applications, for example, captioning a non-existent entity.
To address the hallucination phenomenon, we introduce a Contrastive Instruction Evaluation Method (CIEM) and Contrastive Instruction Tuning (CIT)
We pinpoint the hallucination issues commonly present in existing VLMs, the disability of the current instruction-tuning dataset to handle the hallucination phenomenon and the superiority of CIT-tuned VLMs over both CIEM and public datasets.
arXiv Detail & Related papers (2023-09-05T15:06:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.