Related papers: Interventional Imbalanced Multi-Modal Representation Learning via $β$-Generalization Front-Door Criterion

Interventional Imbalanced Multi-Modal Representation Learning via $β$-Generalization Front-Door Criterion

URL: http://arxiv.org/abs/2406.11490v1
Date: Mon, 17 Jun 2024 12:55:56 GMT
Title: Interventional Imbalanced Multi-Modal Representation Learning via $β$-Generalization Front-Door Criterion
Authors: Yi Li, Jiangmeng Li, Fei Song, Qingmeng Zhu, Changwen Zheng, Wenwen Qiang,
Abstract summary: Multi-modal methods establish comprehensive superiority over uni-modal methods. In imbalanced contributions of different modalities to task-dependent predictions constantly degrade the discriminative performance of canonical multi-modal methods. Benchmark methods raise a tractable solution: augmenting the auxiliary modality with a minor contribution during training.
Score: 17.702549833449435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-modal methods establish comprehensive superiority over uni-modal methods. However, the imbalanced contributions of different modalities to task-dependent predictions constantly degrade the discriminative performance of canonical multi-modal methods. Based on the contribution to task-dependent predictions, modalities can be identified as predominant and auxiliary modalities. Benchmark methods raise a tractable solution: augmenting the auxiliary modality with a minor contribution during training. However, our empirical explorations challenge the fundamental idea behind such behavior, and we further conclude that benchmark approaches suffer from certain defects: insufficient theoretical interpretability and limited exploration capability of discriminative knowledge. To this end, we revisit multi-modal representation learning from a causal perspective and build the Structural Causal Model. Following the empirical explorations, we determine to capture the true causality between the discriminative knowledge of predominant modality and predictive label while considering the auxiliary modality. Thus, we introduce the $\beta$-generalization front-door criterion. Furthermore, we propose a novel network for sufficiently exploring multi-modal discriminative knowledge. Rigorous theoretical analyses and various empirical evaluations are provided to support the effectiveness of the innate mechanism behind our proposed method.

Related papers

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks. Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains. This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z)
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study [44.35454088618666]
We investigate popular sampling-based and tree search-based inference-time scaling methods on 10 challenging tasks spanning various domains. Results show that multi-modal thought promotes better performance against conventional text-only thought. Despite these advantages, multi-modal thoughts necessitate higher token consumption for processing richer visual inputs.
arXiv Detail & Related papers (2025-02-17T07:29:01Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM) Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information. We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z)
Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion [3.66486428341988]
Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving. Uncertainty arising from noise, insufficient evidence, or conflicts between modalities is crucial for reliable decision-making. We propose a novel multimodal learning method with order-invariant evidence fusion and introduce a conflict-based discounting mechanism.
arXiv Detail & Related papers (2024-12-23T22:37:18Z)
A Comprehensive Survey on Evidential Deep Learning and Its Applications [64.83473301188138]
Evidential Deep Learning (EDL) provides reliable uncertainty estimation with minimal additional computation in a single forward pass. We first delve into the theoretical foundation of EDL, the subjective logic theory, and discuss its distinctions from other uncertainty estimation frameworks. We elaborate on its extensive applications across various machine learning paradigms and downstream tasks.
arXiv Detail & Related papers (2024-09-07T05:55:06Z)
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models [58.58594658683919]
Large multimodal models (LMMs) have shown transformative potential across various research tasks. Our findings indicate LMMs possess advantages in zero-shot learning, interpretability, and handling uncurated 'in-the-wild' inputs. We propose a Chain-of-Thought augmented prompting approach, which effectively mitigates the off-target prediction issue.
arXiv Detail & Related papers (2024-05-24T16:26:56Z)
Towards a Unified Framework for Evaluating Explanations [0.6138671548064356]
We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
arXiv Detail & Related papers (2024-05-22T21:49:28Z)
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers [24.160975100349376]
We propose DASH - a learning algorithm that promotes diversity and flatness within deep ensembles. We provide a theoretical backbone for our method along with extensive empirical evidence demonstrating an improvement in ensemble generalizability.
arXiv Detail & Related papers (2024-03-19T23:50:11Z)
A Theory of Multimodal Learning [3.4991031406102238]
The study of multimodality remains relatively under-explored within the field of machine learning. An intriguing finding is that a model trained on multiple modalities can outperform a finely-tuned unimodal model, even on unimodal tasks. This paper provides a theoretical framework that explains this phenomenon, by studying generalization properties of multimodal learning algorithms.
arXiv Detail & Related papers (2023-09-21T20:05:49Z)
SHARCS: Shared Concept Space for Explainable Multimodal Learning [3.899855581265356]
We introduce SHARCS -- a novel concept-based approach for explainable multimodal learning. SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold. We show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios.
arXiv Detail & Related papers (2023-07-01T12:05:20Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains. We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue. We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.