Multimodal Object Detection via Probabilistic a priori Information Integration
- URL: http://arxiv.org/abs/2405.15596v1
- Date: Fri, 24 May 2024 14:28:06 GMT
- Title: Multimodal Object Detection via Probabilistic a priori Information Integration
- Authors: Hafsa El Hafyani, Bastien Pasdeloup, Camille Yver, Pierre Romenteau,
- Abstract summary: Multimodal object detection has shown promise in remote sensing.
In this paper, we investigate multimodal object detection where only one modality contains the target object.
We propose to resolve the alignment problem by converting the contextual binary information into probability maps.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal object detection has shown promise in remote sensing. However, multimodal data frequently encounter the problem of low-quality, wherein the modalities lack strict cell-to-cell alignment, leading to mismatch between different modalities. In this paper, we investigate multimodal object detection where only one modality contains the target object and the others provide crucial contextual information. We propose to resolve the alignment problem by converting the contextual binary information into probability maps. We then propose an early fusion architecture that we validate with extensive experiments on the DOTA dataset.
Related papers
- Multimodal Fusion on Low-quality Data: A Comprehensive Survey [110.22752954128738]
This paper surveys the common challenges and recent advances of multimodal fusion in the wild.
We identify four main challenges that are faced by multimodal fusion on low-quality data.
This new taxonomy will enable researchers to understand the state of the field and identify several potential directions.
arXiv Detail & Related papers (2024-04-27T07:22:28Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion fuses information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI)
Our approach prevents information loss caused by view transformation and eliminates the computation-intensive global attention.
Remarkably, our PoIFusion achieves 74.9% NDS and 73.4% mAP, setting a state-of-the-art record on the multi-modal 3D object detection benchmark.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Multimodal Object Detection in Remote Sensing [2.8698937226234795]
We present a comparison of methods for multimodal object detection in remote sensing.
We survey available multimodal datasets suitable for evaluation, and discuss future directions.
arXiv Detail & Related papers (2023-07-13T12:37:14Z) - Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality.
We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z) - Informative Data Selection with Uncertainty for Multi-modal Object
Detection [25.602915381482468]
We propose a universal uncertainty-aware multi-modal fusion model.
Our model reduces the randomness in fusion and generates reliable output.
Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation.
arXiv Detail & Related papers (2023-04-23T16:36:13Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.