On Modality Bias Recognition and Reduction
- URL: http://arxiv.org/abs/2202.12690v1
- Date: Fri, 25 Feb 2022 13:47:09 GMT
- Title: On Modality Bias Recognition and Reduction
- Authors: Yangyang Guo, Liqiang Nie, Harry Cheng, Zhiyong Cheng, Mohan
Kankanhalli, Alberto Del Bimbo
- Abstract summary: We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
- Score: 70.69194431713825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Making each modality in multi-modal data contribute is of vital importance to
learning a versatile multi-modal model. Existing methods, however, are often
dominated by one or few of modalities during model training, resulting in
sub-optimal performance. In this paper, we refer to this problem as modality
bias and attempt to study it in the context of multi-modal classification
systematically and comprehensively. After stepping into several empirical
analysis, we recognize that one modality affects the model prediction more just
because this modality has a spurious correlation with instance labels. In order
to primarily facilitate the evaluation on the modality bias problem, we
construct two datasets respectively for the colored digit recognition and video
action recognition tasks in line with the Out-of-Distribution (OoD) protocol.
Collaborating with the benchmarks in the visual question answering task, we
empirically justify the performance degradation of the existing methods on
these OoD datasets, which serves as evidence to justify the modality bias
learning. In addition, to overcome this problem, we propose a plug-and-play
loss function method, whereby the feature space for each label is adaptively
learned according to the training set statistics. Thereafter, we apply this
method on eight baselines in total to test its effectiveness. From the results
on four datasets regarding the above three tasks, our method yields remarkable
performance improvements compared with the baselines, demonstrating its
superiority on reducing the modality bias problem.
Related papers
- Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity [9.811378971225727]
This paper extends the current research into missing modalities to the low-data regime.
It is often expensive to get full-modality data and sufficient annotated training samples.
We propose to use retrieval-augmented in-context learning to address these two crucial issues.
arXiv Detail & Related papers (2024-03-14T14:19:48Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Self-attention fusion for audiovisual emotion recognition with
incomplete data [103.70855797025689]
We consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition.
We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms.
arXiv Detail & Related papers (2022-01-26T18:04:29Z) - End-to-End Training of CNN Ensembles for Person Re-Identification [0.0]
We propose an end-to-end ensemble method for person re-identification (ReID) to address the problem of overfitting in discriminative models.
Our proposed ensemble learning framework produces several diverse and accurate base learners in a single DenseNet.
Experiments on several benchmark datasets demonstrate that our method achieves state-of-the-art results.
arXiv Detail & Related papers (2020-10-03T12:40:13Z) - Rank-Based Multi-task Learning for Fair Regression [9.95899391250129]
We develop a novel learning approach for multi-taskart regression models based on a biased dataset.
We use a popular non-parametric oracle-based non-world multipliers dataset.
arXiv Detail & Related papers (2020-09-23T22:32:57Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.