Related papers: Aligning Large Multimodal Models with Factually Augmented RLHF

Aligning Large Multimodal Models with Factually Augmented RLHF

URL: http://arxiv.org/abs/2309.14525v1
Date: Mon, 25 Sep 2023 20:59:33 GMT
Title: Aligning Large Multimodal Models with Factually Augmented RLHF
Authors: Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell
Abstract summary: Large Multimodal Models (LMM) are built across modalities and misalignment between two modalities can result in "hallucination" We adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment. We propose a new alignment algorithm called Factually Augmented RLHF that augments the reward model with additional factual information. Our approach achieves remarkable improvement on the LLaVA-Bench dataset with the 94% performance level of the text-only GPT-4.
Score: 176.54751941088819
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment, where human annotators are asked to compare two responses and pinpoint the more hallucinated one, and the vision-language model is trained to maximize the simulated human rewards. We propose a new alignment algorithm called Factually Augmented RLHF that augments the reward model with additional factual information such as image captions and ground-truth multi-choice options, which alleviates the reward hacking phenomenon in RLHF and further improves the performance. We also enhance the GPT-4-generated training data (for vision instruction tuning) with previously available human-written image-text pairs to improve the general capabilities of our model. To evaluate the proposed approach in real-world scenarios, we develop a new evaluation benchmark MMHAL-BENCH with a special focus on penalizing hallucinations. As the first LMM trained with RLHF, our approach achieves remarkable improvement on the LLaVA-Bench dataset with the 94% performance level of the text-only GPT-4 (while previous best methods can only achieve the 87% level), and an improvement by 60% on MMHAL-BENCH over other baselines. We opensource our code, model, data at https://llava-rlhf.github.io.

Related papers

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation [60.164968941945645]
We introduce LLaVA-Reward, an efficient reward model designed to automatically evaluate text-to-image (T2I) generations across multiple perspectives.<n>LLaVA-Reward directly utilizes the hidden states of multimodal large language models (MLLMs)<n>We train LLaVA-Reward on four evaluation perspectives: text-image alignment, fidelity/artifact, safety, and overall ranking.
arXiv Detail & Related papers (2025-07-28T23:52:53Z)
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization [40.68121267969432]
Existing preference alignment methods focus on aligning model responses with human preferences while neglecting image-text modality alignment.<n>We propose Entity-centric Multimodal Preference Optimization (EMPO), which achieves enhanced modality alignment.<n> EMPO reduces hallucination rates by 85.9% on Object-HalBench and 49.8% on MM-HalBench.
arXiv Detail & Related papers (2025-06-04T15:03:50Z)
Generative RLHF-V: Learning Principles from Multi-modal Human Preference [15.068452240642884]
We introduce Generative RLHF-V, a novel alignment framework that integrates GRMs with multi-modal RLHF.<n>We propose a two-stage pipeline: $textbfmulti-modal generative reward modeling from RL$, where RL guides GRMs to actively capture human intention, then predict the correct pair-wise scores.<n>Our framework improves 4 MLLMs' performance across 7 benchmarks by $18.1%$, while the baseline RLHF is only $5.3%$.
arXiv Detail & Related papers (2025-05-24T05:50:07Z)
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent [57.622821649679786]
Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences. In this paper, we drop the Bradley-Terry (BT) model assumption and study LLM alignment under general preferences, formulated as a two-player game. We show that our approach achieves an $O(T-1)$ bound on the duality gap, improving upon the previous $O(T-1/2)$ result.
arXiv Detail & Related papers (2025-02-24T05:24:52Z)
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment [59.536850459059856]
We introduce MM-RLHF, a dataset containing $mathbf120k$ fine-grained, human-annotated preference comparison pairs. We propose several key innovations to improve the quality of reward models and the efficiency of alignment algorithms. Our approach is rigorously evaluated across $mathbf10$ distinct dimensions and $mathbf27$ benchmarks.
arXiv Detail & Related papers (2025-02-14T18:59:51Z)
Multimodal Preference Data Synthetic Alignment with Reward Model [23.978820500281213]
We propose a new framework in generating synthetic data using a reward model as a proxy of human preference for effective multimodal alignment with DPO training. Experiment results indicate that integrating selected synthetic data, such as from generative and rewards models can effectively reduce reliance on human-annotated data.
arXiv Detail & Related papers (2024-12-23T09:29:40Z)
Self-Evolved Reward Learning for LLMs [45.6910747154447]
Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences. We propose Self-Evolved Reward Learning (SER), a novel approach where the RM generates additional training data to iteratively improve itself. Our results demonstrate that even with limited human-annotated data, learning from self-feedback can robustly enhance RM performance.
arXiv Detail & Related papers (2024-11-01T07:29:03Z)
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment [11.796170286878056]
Direct Preference Optimization (DPO) is effective for aligning large language models (LLMs) It often favors text over image information, leading to unreliable outputs and visual hallucinations. We propose Modality-Fair Preference Optimization (MFPO) to balance text and image preferences.
arXiv Detail & Related papers (2024-10-20T08:56:52Z)
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment [55.7956150385255]
We investigate the efficacy of AI feedback to scale supervision for aligning vision-language models. We introduce VLFeedback, the first large-scale vision-language feedback dataset. We train Silkie, an LVLM fine-tuned via direct preference optimization on VLFeedback.
arXiv Detail & Related papers (2024-10-12T07:56:47Z)
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness [94.03511733306296]
We introduce RLAIF-V, a framework that aligns MLLMs in a fully open-source paradigm for super GPT-4V trustworthiness. RLAIF-V maximally exploits the open-source feedback from two perspectives, including high-quality feedback data and online feedback learning algorithm. Experiments show that RLAIF-V substantially enhances the trustworthiness of models without sacrificing performance on other tasks.
arXiv Detail & Related papers (2024-05-27T14:37:01Z)
Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences. Existing methods work by emulating the preferences at the single decision (turn) level. We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z)
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning [67.62925151837675]
In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning. Specifically, we propose POVID to generate feedback data with AI models. We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data. In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches.
arXiv Detail & Related papers (2024-02-18T00:56:16Z)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble [67.4269821365504]
Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data. We contribute a reward ensemble method that allows the reward model to make more accurate predictions.
arXiv Detail & Related papers (2024-01-30T00:17:37Z)
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback [103.08766858584049]
We present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors.
arXiv Detail & Related papers (2023-12-01T11:36:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.