Informative Data Selection with Uncertainty for Multi-modal Object
Detection
- URL: http://arxiv.org/abs/2304.11697v1
- Date: Sun, 23 Apr 2023 16:36:13 GMT
- Title: Informative Data Selection with Uncertainty for Multi-modal Object
Detection
- Authors: Xinyu Zhang, Zhiwei Li, Zhenhong Zou, Xin Gao, Yijin Xiong, Dafeng
Jin, Jun Li, and Huaping Liu
- Abstract summary: We propose a universal uncertainty-aware multi-modal fusion model.
Our model reduces the randomness in fusion and generates reliable output.
Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation.
- Score: 25.602915381482468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Noise has always been nonnegligible trouble in object detection by creating
confusion in model reasoning, thereby reducing the informativeness of the data.
It can lead to inaccurate recognition due to the shift in the observed pattern,
that requires a robust generalization of the models. To implement a general
vision model, we need to develop deep learning models that can adaptively
select valid information from multi-modal data. This is mainly based on two
reasons. Multi-modal learning can break through the inherent defects of
single-modal data, and adaptive information selection can reduce chaos in
multi-modal data. To tackle this problem, we propose a universal
uncertainty-aware multi-modal fusion model. It adopts a multi-pipeline loosely
coupled architecture to combine the features and results from point clouds and
images. To quantify the correlation in multi-modal information, we model the
uncertainty, as the inverse of data information, in different modalities and
embed it in the bounding box generation. In this way, our model reduces the
randomness in fusion and generates reliable output. Moreover, we conducted a
completed investigation on the KITTI 2D object detection dataset and its
derived dirty data. Our fusion model is proven to resist severe noise
interference like Gaussian, motion blur, and frost, with only slight
degradation. The experiment results demonstrate the benefits of our adaptive
fusion. Our analysis on the robustness of multi-modal fusion will provide
further insights for future research.
Related papers
- RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Defending Multimodal Fusion Models against Single-Source Adversaries [6.019777076722421]
We show that standard multimodal fusion models are vulnerable to single-source adversaries.
An attack on any single modality can overcome the correct information from multiple unperturbed modalities and cause the model to fail.
Motivated by this finding, we propose an adversarially robust fusion strategy.
arXiv Detail & Related papers (2022-06-25T18:57:02Z) - Discriminative Multimodal Learning via Conditional Priors in Generative
Models [21.166519800652047]
This research studies the realistic scenario in which all modalities and class labels are available for model training.
We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities.
arXiv Detail & Related papers (2021-10-09T17:22:24Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Learning Disentangled Latent Factors from Paired Data in Cross-Modal
Retrieval: An Implicit Identifiable VAE Approach [33.61751393224223]
We deal with the problem of learning the underlying disentangled latent factors that are shared between the paired bi-modal data in cross-modal retrieval.
We propose a novel idea of the implicit decoder, which completely removes the ambient data decoding module from a latent variable model.
Our model is shown to identify the factors accurately, significantly outperforming conventional encoder-decoder latent variable models.
arXiv Detail & Related papers (2020-12-01T17:47:50Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.