Related papers: Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data

Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data

URL: http://arxiv.org/abs/2603.02695v1
Date: Tue, 03 Mar 2026 07:38:13 GMT
Title: Addressing Missing and Noisy Modalities in One Solution: Unified Modality-Quality Framework for Low-quality Multimodal Data
Authors: Sijie Mai, Shiqin Han, Haifeng Hu,
Abstract summary: We propose a unified modality-quality (UMQ) framework to enhance low-quality representations for multimodal affective computing.<n>UMQ consistently outperforms state-of-the-art baselines on multiple datasets under the settings of complete, missing, and noisy modalities.
Score: 15.181276098822737
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal data encountered in real-world scenarios are typically of low quality, with noisy modalities and missing modalities being typical forms that severely hinder model performance and robustness. However, prior works often handle noisy and missing modalities separately. In contrast, we jointly address missing and noisy modalities to enhance model robustness in low-quality data scenarios. We regard both noisy and missing modalities as a unified low-quality modality problem, and propose a unified modality-quality (UMQ) framework to enhance low-quality representations for multimodal affective computing. Firstly, we train a quality estimator with explicit supervised signals via a rank-guided training strategy that compares the relative quality of different representations by adding a ranking constraint, avoiding training noise caused by inaccurate absolute quality labels. Then, a quality enhancer for each modality is constructed, which uses the sample-specific information provided by other modalities and the modality-specific information provided by the defined modality baseline representation to enhance the quality of unimodal representations. Finally, we propose a quality-aware mixture-of-experts module with particular routing mechanism to enable multiple modality-quality problems to be addressed more specifically. UMQ consistently outperforms state-of-the-art baselines on multiple datasets under the settings of complete, missing, and noisy modalities.

Related papers

BriMA: Bridged Modality Adaptation for Multi-Modal Continual Action Quality Assessment [25.689906499244533]
Action Quality Assessment (AQA) aims to score how well an action is performed and is widely used in sports analysis, rehabilitation assessment, and human skill evaluation.<n>We introduce Bridged Modality Adaptation (BriMA), an innovative approach to multi-modal continual AQA under modality-missing conditions.<n>BriMA consists of a memory-guided bridging imputation module that reconstructs missing modalities using both task-agnostic and task-specific representations, and a modality-aware replay mechanism that prioritizes informative samples based on modality distortion and distribution drift.
arXiv Detail & Related papers (2026-02-22T13:00:52Z)
Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification [55.56234913868664]
We propose Test-time Adaptive Hierarchical Co-enhanced Denoising Network (TAHCD) for reliable learning on multimodal data.<n>The proposed method achieves superior classification performance, robustness, and generalization compared with state-of-the-art reliable multimodal learning approaches.
arXiv Detail & Related papers (2026-01-12T03:14:12Z)
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models [26.005367102695317]
Multimodal Large Language Models can exhibit difficulty in distinguishing task-relevant from irrelevant signals.<n>We show that spurious information from irrelevant modalities often leads to significant performance degradation.<n>We propose a novel framework to finetune MLLMs, including perturbation-based data augmentation with both perturbations and adversarial perturbations.
arXiv Detail & Related papers (2025-05-26T07:31:32Z)
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities.<n>Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality.<n>We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z)
Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)<n>Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.<n>We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z)
Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference [20.761803725098005]
Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities. A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number of inference networks for all possible modality combinations. We introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework.
arXiv Detail & Related papers (2024-10-15T08:49:38Z)
Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding. Q-Ground combines large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z)
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges. This often leads to the issue of missing modalities, where data for certain modalities are absent. We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z)
Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity [55.399230250413986]
We propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) to remove harmful semantic noise features from the upstream task.<n>Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.
arXiv Detail & Related papers (2023-12-11T06:50:27Z)
Self-attention fusion for audiovisual emotion recognition with incomplete data [103.70855797025689]
We consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition. We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms.
arXiv Detail & Related papers (2022-01-26T18:04:29Z)
Quality-Aware Multimodal Biometric Recognition [30.322429033099688]
We develop a quality-aware framework for fusing representations of input modalities by weighting their importance using quality scores estimated in a weakly-supervised fashion. This framework utilizes two fusion blocks, each represented by a set of quality-aware and aggregation networks. We evaluate the performance by considering three multimodal datasets consisting of face, iris, and fingerprint modalities.
arXiv Detail & Related papers (2021-12-10T20:48:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.