CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
- URL: http://arxiv.org/abs/2501.16629v1
- Date: Tue, 28 Jan 2025 02:05:38 GMT
- Title: CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
- Authors: Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng,
- Abstract summary: Multimodal Large Language Models (MLLMs) still struggle with hallucinations despite their impressive capabilities.
We propose a Cross-modal Hierarchical Direct Preference Optimization (CHiP) to address these limitations.
We evaluate CHiP through both quantitative and qualitative analyses, with results across multiple benchmarks demonstrating its effectiveness in reducing hallucinations.
- Score: 107.21334626890713
- License:
- Abstract: Multimodal Large Language Models (MLLMs) still struggle with hallucinations despite their impressive capabilities. Recent studies have attempted to mitigate this by applying Direct Preference Optimization (DPO) to multimodal scenarios using preference pairs from text-based responses. However, our analysis of representation distributions reveals that multimodal DPO struggles to align image and text representations and to distinguish between hallucinated and non-hallucinated descriptions. To address these challenges, in this work, we propose a Cross-modal Hierarchical Direct Preference Optimization (CHiP) to address these limitations. We introduce a visual preference optimization module within the DPO framework, enabling MLLMs to learn from both textual and visual preferences simultaneously. Furthermore, we propose a hierarchical textual preference optimization module that allows the model to capture preferences at multiple granular levels, including response, segment, and token levels. We evaluate CHiP through both quantitative and qualitative analyses, with results across multiple benchmarks demonstrating its effectiveness in reducing hallucinations. On the Object HalBench dataset, CHiP outperforms DPO in hallucination reduction, achieving improvements of 52.7% and 55.5% relative points based on the base model Muffin and LLaVA models, respectively. We make all our datasets and code publicly available: https://github.com/LVUGAI/CHiP.
Related papers
- Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization [19.37373012848517]
Large Vision Language Models (VLMs) are prone to significant hallucinations, particularly in the form of cross-modal inconsistencies.
We introduce Re-Align, a novel alignment framework that leverages image retrieval to construct a dual-preference dataset.
We also introduce rDPO, an extension of the standard direct preference optimization that incorporates an additional visual preference objective during fine-tuning.
arXiv Detail & Related papers (2025-02-18T18:59:57Z) - Multimodal Preference Data Synthetic Alignment with Reward Model [23.978820500281213]
We propose a new framework in generating synthetic data using a reward model as a proxy of human preference for effective multimodal alignment with DPO training.
Experiment results indicate that integrating selected synthetic data, such as from generative and rewards models can effectively reduce reliance on human-annotated data.
arXiv Detail & Related papers (2024-12-23T09:29:40Z) - MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models [85.30735602813093]
Multi-Image Augmented Direct Preference Optimization (MIA-DPO) is a visual preference alignment approach that effectively handles multi-image inputs.
MIA-DPO mitigates the scarcity of diverse multi-image training data by extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats.
arXiv Detail & Related papers (2024-10-23T07:56:48Z) - Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - Understanding Alignment in Multimodal LLMs: A Comprehensive Study [46.33812471516309]
We analyze each aspect of preference alignment in Multimodal Large Language Models (MLLMs)
We show that combining offline and online methods can improve the performance of the model in certain scenarios.
We introduce a novel way of creating multimodal preference data called Bias-Driven Hallucination Sampling (BDHS)
arXiv Detail & Related papers (2024-07-02T17:55:03Z) - mDPO: Conditional Preference Optimization for Multimodal Large Language Models [52.607764280030196]
Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment.
Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement.
We propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference.
arXiv Detail & Related papers (2024-06-17T17:59:58Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.