Revisiting Modality Imbalance In Multimodal Pedestrian Detection
- URL: http://arxiv.org/abs/2302.12589v2
- Date: Fri, 7 Jul 2023 08:22:17 GMT
- Title: Revisiting Modality Imbalance In Multimodal Pedestrian Detection
- Authors: Arindam Das, Sudip Das, Ganesh Sistu, Jonathan Horgan, Ujjwal
Bhattacharya, Edward Jones, Martin Glavin, and Ciar\'an Eising
- Abstract summary: We introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities.
Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training.
- Score: 6.7841188753203046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal learning, particularly for pedestrian detection, has recently
received emphasis due to its capability to function equally well in several
critical autonomous driving scenarios such as low-light, night-time, and
adverse weather conditions. However, in most cases, the training distribution
largely emphasizes the contribution of one specific input that makes the
network biased towards one modality. Hence, the generalization of such models
becomes a significant problem where the non-dominant input modality during
training could be contributing more to the course of inference. Here, we
introduce a novel training setup with regularizer in the multimodal
architecture to resolve the problem of this disparity between the modalities.
Specifically, our regularizer term helps to make the feature fusion method more
robust by considering both the feature extractors equivalently important during
the training to extract the multimodal distribution which is referred to as
removing the imbalance problem. Furthermore, our decoupling concept of output
stream helps the detection task by sharing the spatial sensitive information
mutually. Extensive experiments of the proposed method on KAIST and UTokyo
datasets shows improvement of the respective state-of-the-art performance.
Related papers
- Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - Robust Multimodal Learning via Representation Decoupling [6.7678581401558295]
Multimodal learning has attracted increasing attention due to its practicality.
Existing methods tend to address it by learning a common subspace representation for different modality combinations.
We propose a novel Decoupled Multimodal Representation Network (DMRNet) to assist robust multimodal learning.
arXiv Detail & Related papers (2024-07-05T12:09:33Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [95.49699178874683]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - One-stage Modality Distillation for Incomplete Multimodal Learning [7.791488931628906]
This paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion.
The proposed framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-09-15T07:12:27Z) - Adaptive Contrastive Learning on Multimodal Transformer for Review
Helpfulness Predictions [40.70793282367128]
We propose Multimodal Contrastive Learning for Multimodal Review Helpfulness Prediction (MRHP) problem.
In addition, we introduce Adaptive Weighting scheme for our contrastive learning approach.
Finally, we propose Multimodal Interaction module to address the unalignment nature of multimodal data.
arXiv Detail & Related papers (2022-11-07T13:05:56Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Rethinking the constraints of multimodal fusion: case study in
Weakly-Supervised Audio-Visual Video Parsing [5.395800183719964]
We show that selecting the optimal feature extraction network collocation is a very important subproblem in multimodal tasks.
A novel method is proposed to convert the optimization problem into an issue of comparative upper bounds by referring to the general practice of extreme value conversion in mathematics.
arXiv Detail & Related papers (2021-05-30T05:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.