Silkie: Preference Distillation for Large Visual Language Models
- URL: http://arxiv.org/abs/2312.10665v1
- Date: Sun, 17 Dec 2023 09:44:27 GMT
- Title: Silkie: Preference Distillation for Large Visual Language Models
- Authors: Lei Li, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen,
Yazheng Yang, Benyou Wang, Lingpeng Kong
- Abstract summary: This paper explores preference distillation for large vision language models (LVLMs)
We first build a vision-language feedback dataset utilizing AI annotation.
We adopt GPT-4V to assess the generated outputs regarding helpfulness, visual faithfulness, and ethical considerations.
The resulting model Silkie, achieves 6.9% and 9.5% relative improvement on the MME benchmark regarding the perception and cognition capabilities.
- Score: 56.10697821410489
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores preference distillation for large vision language models
(LVLMs), improving their ability to generate helpful and faithful responses
anchoring the visual context. We first build a vision-language feedback
(VLFeedback) dataset utilizing AI annotation. Specifically, responses are
generated by models sampled from 12 LVLMs, conditioned on multi-modal
instructions sourced from various datasets. We adopt GPT-4V to assess the
generated outputs regarding helpfulness, visual faithfulness, and ethical
considerations. Furthermore, the preference supervision is distilled into
Qwen-VL-Chat through the direct preference optimization (DPO) method. The
resulting model Silkie, achieves 6.9% and 9.5% relative improvement on the MME
benchmark regarding the perception and cognition capabilities, respectively.
Silkie also demonstrates reduced hallucination by setting a new
state-of-the-art score of 3.02 on the MMHal-Bench benchmark. Further analysis
shows that DPO with our VLFeedback dataset mainly boosts the fine-grained
perception and complex cognition abilities of LVLMs, leading to more
comprehensive improvements compared to human-annotated preference datasets.
Related papers
- Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis [55.65459867300319]
LLMs demonstrate remarkable capabilities in following natural language instructions, largely due to instruction-tuning on high-quality datasets.
Recent approaches incorporate feedback to improve data quality, but typically operate at the sample level, generating and applying feedback for each response individually.
We propose Reference-Level Feedback, a novel methodology that instead collects feedback based on high-quality reference samples from carefully curated seed data.
arXiv Detail & Related papers (2025-02-06T21:29:00Z) - Multimodal Preference Data Synthetic Alignment with Reward Model [23.978820500281213]
We propose a new framework in generating synthetic data using a reward model as a proxy of human preference for effective multimodal alignment with DPO training.
Experiment results indicate that integrating selected synthetic data, such as from generative and rewards models can effectively reduce reliance on human-annotated data.
arXiv Detail & Related papers (2024-12-23T09:29:40Z) - EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation [58.546205554954454]
We propose Enhancing Alignment in MLLMs via Critical Observation (EACO)
EACO aligns MLLMs by self-generated preference data using only 5k images economically.
EACO reduces the overall hallucinations by 65.6% on HallusionBench and improves the reasoning ability by 21.8% on MME-Cognition.
arXiv Detail & Related papers (2024-12-06T09:59:47Z) - VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment [55.7956150385255]
We investigate the efficacy of AI feedback to scale supervision for aligning vision-language models.
We introduce VLFeedback, the first large-scale vision-language feedback dataset.
We train Silkie, an LVLM fine-tuned via direct preference optimization on VLFeedback.
arXiv Detail & Related papers (2024-10-12T07:56:47Z) - Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness [102.06442250444618]
We introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm.
RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation.
Experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models.
arXiv Detail & Related papers (2024-05-27T14:37:01Z) - Calibrated Self-Rewarding Vision Language Models [27.686545023186852]
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning.
LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image.
We propose the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning.
arXiv Detail & Related papers (2024-05-23T14:30:33Z) - ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models [45.040292339670096]
Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities.
This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data.
arXiv Detail & Related papers (2024-02-18T19:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.