Robust Object Detection with Multi-input Multi-output Faster R-CNN
- URL: http://arxiv.org/abs/2111.13065v1
- Date: Thu, 25 Nov 2021 12:59:34 GMT
- Title: Robust Object Detection with Multi-input Multi-output Faster R-CNN
- Authors: Sebastian Cygert, Andrzej Czyzewski
- Abstract summary: This work applies the multi-input multi-output architecture (MIMO) to the task of object detection using the general-purpose Faster R-CNN model.
MIMO allows building strong feature representation and obtains very competitive accuracy when using just two input/output pairs.
It also adds just 0.5% additional model parameters and increases the inference time by 15.9% when compared to the standard Faster R-CNN.
- Score: 2.9823962001574182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have seen impressive progress in visual recognition on many
benchmarks, however, generalization to the real-world in out-of-distribution
setting remains a significant challenge. A state-of-the-art method for robust
visual recognition is model ensembling. however, recently it was shown that
similarly competitive results could be achieved with a much smaller cost, by
using multi-input multi-output architecture (MIMO). In this work, a
generalization of the MIMO approach is applied to the task of object detection
using the general-purpose Faster R-CNN model. It was shown that using the MIMO
framework allows building strong feature representation and obtains very
competitive accuracy when using just two input/output pairs. Furthermore, it
adds just 0.5\% additional model parameters and increases the inference time by
15.9\% when compared to the standard Faster R-CNN. It also works comparably to,
or outperforms the Deep Ensemble approach in terms of model accuracy,
robustness to out-of-distribution setting, and uncertainty calibration when the
same number of predictions is used. This work opens up avenues for applying the
MIMO approach in other high-level tasks such as semantic segmentation and depth
estimation.
Related papers
- Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter [32.64004722423187]
We show how to improve the robustness of RGB-skeleton action recognition models.
We propose the formatwordAttention-based formatwordModality formatwordReweighter (formatwordAMR)
Our AMR is plug-and-play, allowing easy integration with multimodal models.
arXiv Detail & Related papers (2024-07-29T13:15:51Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Perceiver-based CDF Modeling for Time Series Forecasting [25.26713741799865]
We propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data.
Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction.
Experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods.
arXiv Detail & Related papers (2023-10-03T01:13:17Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Mutual Information Regularization for Weakly-supervised RGB-D Salient
Object Detection [33.210575826086654]
We present a weakly-supervised RGB-D salient object detection model via supervision.
We focus on effective multimodal representation learning via inter-modal mutual information regularization.
arXiv Detail & Related papers (2023-06-06T12:36:57Z) - IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via
Sequence Modeling [3.867363075280544]
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data.
New model is developed, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM)
Model achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
arXiv Detail & Related papers (2023-01-06T10:08:11Z) - RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries.
A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function.
Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z) - SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition [49.42625022146008]
We present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks.
Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.
arXiv Detail & Related papers (2021-10-11T19:23:50Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.