Balanced Audiovisual Dataset for Imbalance Analysis
- URL: http://arxiv.org/abs/2302.10912v2
- Date: Thu, 8 Jun 2023 06:58:05 GMT
- Title: Balanced Audiovisual Dataset for Imbalance Analysis
- Authors: Wenke Xia, Xu Zhao, Xincheng Pang, Changqing Zhang, Di Hu
- Abstract summary: The imbalance problem is widespread in the field of machine learning, which also exists in multimodal learning areas.
Recent works have attempted to solve the modality imbalance problem from algorithm perspective, however, they do not fully analyze the influence of modality bias in datasets.
- Score: 31.510912639133014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The imbalance problem is widespread in the field of machine learning, which
also exists in multimodal learning areas caused by the intrinsic discrepancy
between modalities of samples. Recent works have attempted to solve the
modality imbalance problem from algorithm perspective, however, they do not
fully analyze the influence of modality bias in datasets. Concretely, existing
multimodal datasets are usually collected under specific tasks, where one
modality tends to perform better than other ones in most conditions. In this
work, to comprehensively explore the influence of modality bias, we first split
existing datasets into different subsets by estimating sample-wise modality
discrepancy. We surprisingly find that: the multimodal models with existing
imbalance algorithms consistently perform worse than the unimodal one on
specific subsets, in accordance with the modality bias. To further explore the
influence of modality bias and analyze the effectiveness of existing imbalance
algorithms, we build a balanced audiovisual dataset, with uniformly distributed
modality discrepancy over the whole dataset. We then conduct extensive
experiments to re-evaluate existing imbalance algorithms and draw some
interesting findings: existing algorithms only provide a compromise between
modalities and suffer from the large modality discrepancy of samples. We hope
that these findings could facilitate future research on the modality imbalance
problem.
Related papers
- BalanceBenchmark: A Survey for Imbalanced Learning [9.858467766666223]
Multimodal learning has gained attention for its capacity to integrate information from different modalities.
It is often hindered by the multimodal imbalance problem, where certain modality dominates while others remain underutilized.
We systematically categorize various mainstream multimodal imbalance algorithms into four groups based on the strategies they employ to mitigate imbalance.
arXiv Detail & Related papers (2025-02-15T14:42:42Z) - Enhancing multimodal cooperation via sample-level modality valuation [10.677997431505815]
We introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample.
Via modality valuation we observe that modality discrepancy indeed could be different at sample-level beyond the global contribution discrepancy at dataset-level.
Our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement.
arXiv Detail & Related papers (2023-09-12T14:16:34Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Simplicity Bias Leads to Amplified Performance Disparities [8.60453031364566]
We show that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class.
A model may prioritize any class or group of the dataset that it finds simple-at the expense of what it finds complex.
arXiv Detail & Related papers (2022-12-13T15:24:41Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Handling Imbalanced Data: A Case Study for Binary Class Problems [0.0]
The major issues in terms of solving for classification problems are the issues of Imbalanced data.
This paper focuses on both synthetic oversampling techniques and manually computes synthetic data points to enhance easy comprehension of the algorithms.
We analyze the application of these synthetic oversampling techniques on binary classification problems with different Imbalanced ratios and sample sizes.
arXiv Detail & Related papers (2020-10-09T02:04:14Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Heteroskedastic and Imbalanced Deep Learning with Adaptive
Regularization [55.278153228758434]
Real-world datasets are heteroskedastic and imbalanced.
Addressing heteroskedasticity and imbalance simultaneously is under-explored.
We propose a data-dependent regularization technique for heteroskedastic datasets.
arXiv Detail & Related papers (2020-06-29T01:09:50Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.