RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
- URL: http://arxiv.org/abs/2312.00849v2
- Date: Fri, 8 Mar 2024 06:42:37 GMT
- Title: RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
- Authors: Tianyu Yu and Yuan Yao and Haoye Zhang and Taiwen He and Yifeng Han
and Ganqu Cui and Jinyi Hu and Zhiyuan Liu and Hai-Tao Zheng and Maosong Sun
and Tat-Seng Chua
- Abstract summary: We present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback.
Experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors.
- Score: 103.08766858584049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding, reasoning, and
interaction. However, existing MLLMs prevalently suffer from serious
hallucination problems, generating text that is not factually grounded in
associated images. The problem makes existing MLLMs untrustworthy and thus
impractical in real-world (especially high-stakes) applications. To address the
challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior
alignment from fine-grained correctional human feedback. Specifically, RLHF-V
collects human preference in the form of segment-level corrections on
hallucinations, and performs dense direct preference optimization over the
human feedback. Comprehensive experiments on five benchmarks in both automatic
and human evaluation show that, RLHF-V can enable substantially more
trustworthy MLLM behaviors with promising data and computation efficiency.
Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the
hallucination rate of the base MLLM by 34.8%, outperforming the concurrent
LLaVA-RLHF trained on 10k annotated data. The final model achieves
state-of-the-art performance in trustworthiness among open-source MLLMs, and
shows better robustness than GPT-4V in preventing hallucinations aroused from
over-generalization. We open-source our code, model, and data at
https://github.com/RLHF-V/RLHF-V.
Related papers
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness [94.03511733306296]
We introduce RLAIF-V, a framework that aligns MLLMs in a fully open-source paradigm for super GPT-4V trustworthiness.
RLAIF-V maximally exploits the open-source feedback from two perspectives, including high-quality feedback data and online feedback learning algorithm.
Experiments show that RLAIF-V substantially enhances the trustworthiness of models without sacrificing performance on other tasks.
arXiv Detail & Related papers (2024-05-27T14:37:01Z) - RLHF Workflow: From Reward Modeling to Online RLHF [79.83927049253924]
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report.
RLHF is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.
We show that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets.
arXiv Detail & Related papers (2024-05-13T15:50:39Z) - ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback [86.87638927637005]
ChatGLM is a free-to-use AI service powered by large language models (LLMs)
We present the ChatGLM-RLHF pipeline, designed to enhance ChatGLM's alignment with human preferences.
arXiv Detail & Related papers (2024-04-01T05:39:36Z) - Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large
Language Models [84.78513908768011]
We propose a novel and efficient method for MLLMs, termed Mixture-of-Resolution Adaptation (MRA)
MRA adopts two visual pathways for images with different resolutions, where high-resolution visual information is embedded into the low-resolution pathway.
To validate MRA, we apply it to a recent MLLM called LLaVA, and term the new model LLaVA-HR.
arXiv Detail & Related papers (2024-03-05T14:31:24Z) - The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs [36.42188183017291]
We identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers.
To quantify the effect, we propose CorrelationQA, the first benchmark that assesses the hallucination level given spurious images.
We conduct a thorough analysis on 9 mainstream MLLMs, illustrating that they universally suffer from this instinctive bias to varying degrees.
arXiv Detail & Related papers (2024-02-06T06:48:46Z) - Beyond Training Objectives: Interpreting Reward Model Divergence in
Large Language Models [8.15890412446096]
Large language models (LLMs) fine-tuned by reinforcement learning from human feedback are becoming more widely deployed.
We coin the term $textitImplicit Reward Model$ (IRM) to refer to the changes that occur to an LLM that result in high-reward generations.
arXiv Detail & Related papers (2023-10-12T09:36:03Z) - SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to
RLHF [19.43122743768123]
We propose SteerLM, a supervised fine-tuning method that empowers end-users to control responses during inference.
SteerLM conditions responses to conform to an explicitly defined multi-dimensional set of attributes, thereby empowering a steerable AI capable of generating helpful and high-quality responses.
arXiv Detail & Related papers (2023-10-09T02:11:21Z) - Aligning Large Multimodal Models with Factually Augmented RLHF [176.54751941088819]
Large Multimodal Models (LMM) are built across modalities and misalignment between two modalities can result in "hallucination"
We adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment.
We propose a new alignment algorithm called Factually Augmented RLHF that augments the reward model with additional factual information.
Our approach achieves remarkable improvement on the LLaVA-Bench dataset with the 94% performance level of the text-only GPT-4.
arXiv Detail & Related papers (2023-09-25T20:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.