Related papers: MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains

MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains

URL: http://arxiv.org/abs/2511.06452v2
Date: Fri, 14 Nov 2025 08:32:41 GMT
Title: MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
Authors: Leyan Xue, Changqing Zhang, Kecheng Xue, Xiaohong Liu, Guangyu Wang, Zongbo Han,
Abstract summary: We have developed a large-scale, domain-adaptive benchmark for multimodal evaluation.<n>This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks.<n>We have also developed an open-source, unified, and automated evaluation pipeline.
Score: 35.511656323075506
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.

Related papers

Towards Multimodal Domain Generalization with Few Labels [37.21678123296403]
We introduce and study a new problem, Semi-Supervised Multimodal Domain Generalization (SSMDG)<n>SSMDG aims to learn robust multimodal models from multi-source data with few labeled samples.<n>We propose a unified framework featuring three key components: Consensus-Driven Consistency Regularization, Disagreement-Aware Regularization, and Cross-Modal Prototype Alignment.
arXiv Detail & Related papers (2026-02-26T12:05:56Z)
MissMAC-Bench: Building Solid Benchmark for Missing Modality Issue in Robust Multimodal Affective Computing [21.70459049925545]
MissMAC-Bench is a comprehensive benchmark designed to establish fair and unified evaluation standards.<n>Two guiding principles are proposed, including no missing prior during training.<n>Our benchmark integrates evaluation protocols with both fixed and random missing patterns at the dataset and instance levels.
arXiv Detail & Related papers (2026-01-31T16:39:34Z)
SERM: Self-Evolving Relevance Model with Agent-Driven Learning from Massive Query Streams [53.78257200138774]
We propose a Self-Evolving Relevance Model approach (SERM), which comprises two complementary multi-agent modules.<n>We evaluate SERM in a large-scale industrial setting, which serves billions of user requests daily.
arXiv Detail & Related papers (2026-01-14T14:31:16Z)
REVELIO -- Universal Multimodal Task Load Estimation for Cross-Domain Generalization [2.689067085628911]
This paper introduces a new multimodal dataset that extends established cognitive load detection benchmarks with a real-world gaming application.<n>Task load annotations are derived from objective performance, subjective NASA-TLX ratings, and task-level design.<n>State-of-the-art end-to-end model, including xLSTM, ConvNeXt, and Transformer architectures are systematically trained and evaluated.
arXiv Detail & Related papers (2025-09-01T17:36:09Z)
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models [54.196385799229006]
This survey provides the first comprehensive review of advances from traditional approaches to foundation models.<n>It covers: (1) Multimodal domain adaptation; (2) Multimodal test-time adaptation; (3) Multimodal domain generalization; (4) Domain adaptation and generalization with the help of multimodal foundation models; and (5) Adaptation of multimodal foundation models.
arXiv Detail & Related papers (2025-01-30T18:59:36Z)
Towards Modality Generalization: A Benchmark and Prospective Analysis [68.20973671493203]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities.<n>We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization.<n>Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z)
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs)<n>MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts.<n>It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z)
MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning [110.54752872873472]
MultiZoo is a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms. MultiBench is a benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas.
arXiv Detail & Related papers (2023-06-28T17:59:10Z)
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning [87.23266008930045]
MultiBench is a systematic and unified benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. It provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. It introduces impactful challenges for future research, including robustness to large-scale multimodal datasets and robustness to realistic imperfections.
arXiv Detail & Related papers (2021-07-15T17:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.