MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization
- URL: http://arxiv.org/abs/2603.03192v1
- Date: Tue, 03 Mar 2026 17:50:24 GMT
- Title: MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization
- Authors: Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani,
- Abstract summary: We propose Modality-Decoupled Direct Preference Optimization (MoD-DPO) for improving modality grounding in omni LLMs.<n>MoD-DPO introduces modality-aware regularization terms that explicitly enforce invariance to corruptions in irrelevant modalities and sensitivity to perturbations in relevant modalities.<n>Experiments demonstrate that MoD-DPO consistently improves perception accuracy and hallucination resistance, outperforming previous preference optimization baselines.
- Score: 4.088161686930475
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Omni-modal large language models (omni LLMs) have recently achieved strong performance across audiovisual understanding tasks, yet they remain highly susceptible to cross-modal hallucinations arising from spurious correlations and dominant language priors. In this work, we propose Modality-Decoupled Direct Preference Optimization (MoD-DPO), a simple and effective framework for improving modality grounding in omni LLMs. MoD-DPO introduces modality-aware regularization terms that explicitly enforce invariance to corruptions in irrelevant modalities and sensitivity to perturbations in relevant modalities, thereby reducing unintended cross-modal interactions. To further mitigate over-reliance on textual priors, we incorporate a language-prior debiasing penalty that discourages hallucination-prone text-only responses. Extensive experiments across multiple audiovisual hallucination benchmarks demonstrate that MoD-DPO consistently improves perception accuracy and hallucination resistance, outperforming previous preference optimization baselines under similar training budgets. Our findings underscore the importance of modality-faithful alignment and demonstrate a scalable path toward more reliable and resilient multimodal foundation models.
Related papers
- DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations [22.299736215070343]
Multimodal Large Language Models (MLLMs) tend to overemphasize easily distinguishable preference pairs.<n>We propose Difficulty-Aware Direct Preference Optimization (DA-DPO), a cost-effective framework designed to balance the learning process.
arXiv Detail & Related papers (2026-01-02T09:41:54Z) - Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation [60.33386541343322]
We propose a Multimodal Large Language Models framework that integrates Hardness-aware and Noise-regularized preference optimization for Recommendation (HaNoRec)<n>Specifically, HaNoRec dynamically adjusts optimization weights based on both the estimated hardness of each training sample and the policy model's real-time responsiveness.
arXiv Detail & Related papers (2025-11-24T04:10:46Z) - Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection [54.10252086842123]
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos.<n>This paper proposes a modality optimization and dynamic primary modality selection framework (MODS)<n>Experiments on four benchmark datasets demonstrate that MODS outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-09T11:13:32Z) - Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization [68.64764778089229]
We propose MISP-DPO, the first framework to incorporate multiple, semantically diverse negative images in multimodal DPO.<n>Our method embeds prompts and candidate images in CLIP space and applies a sparse autoencoder to uncover semantic deviations into interpretable factors.<n>Experiments across five benchmarks demonstrate that MISP-DPO consistently improves multimodal alignment over prior methods.
arXiv Detail & Related papers (2025-09-30T03:24:09Z) - Mitigating Visual Hallucinations via Semantic Curriculum Preference Optimization in MLLMs [21.509992905027023]
Multimodal Large Language Models (MLLMs) have significantly improved the performance of various tasks, but continue to suffer from visual hallucinations.<n>We propose Semantic Curriculum Preference Optimization ( SCPO), a novel framework for MLLM alignment.<n> SCPO employs a progressive, easy-to-hard curriculum built upon our Semantic Curriculum Preference Pairs dataset, which provides fine-grained semantic contrasts sorted by difficulty.
arXiv Detail & Related papers (2025-09-29T09:03:36Z) - Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization [69.05600758833471]
Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs)<n>We propose a Symmetric Multimodal Preference Optimization (SymMPO) which conducts symmetric preference learning with direct preference supervision (i.e., response pairs)<n>In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs.
arXiv Detail & Related papers (2025-06-13T12:29:15Z) - Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization [45.55180760002661]
Large Video Language Models (LVLMs) struggle with fine-grained temporal understanding, hallucinate, and often make simple mistakes on even simple video question-answering tasks.<n>We propose a self-alignment framework that enables LVLMs to learn from their own errors.
arXiv Detail & Related papers (2025-04-16T13:43:56Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.