Multimodal Object Detection via Probabilistic a priori Information Integration
- URL: http://arxiv.org/abs/2405.15596v1
- Date: Fri, 24 May 2024 14:28:06 GMT
- Title: Multimodal Object Detection via Probabilistic a priori Information Integration
- Authors: Hafsa El Hafyani, Bastien Pasdeloup, Camille Yver, Pierre Romenteau,
- Abstract summary: Multimodal object detection has shown promise in remote sensing.
In this paper, we investigate multimodal object detection where only one modality contains the target object.
We propose to resolve the alignment problem by converting the contextual binary information into probability maps.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal object detection has shown promise in remote sensing. However, multimodal data frequently encounter the problem of low-quality, wherein the modalities lack strict cell-to-cell alignment, leading to mismatch between different modalities. In this paper, we investigate multimodal object detection where only one modality contains the target object and the others provide crucial contextual information. We propose to resolve the alignment problem by converting the contextual binary information into probability maps. We then propose an early fusion architecture that we validate with extensive experiments on the DOTA dataset.
Related papers
- Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach [10.376378437321437]
We propose a Multimedia Misinformation Detection framework for detecting misinformation from video content by leveraging cross-modal entity consistency.
Our results demonstrate that MultiMD outperforms state-of-the-art baseline models.
arXiv Detail & Related papers (2024-08-16T16:14:36Z) - Multimodal Fusion on Low-quality Data: A Comprehensive Survey [110.22752954128738]
This paper surveys the common challenges and recent advances of multimodal fusion in the wild.
We identify four main challenges that are faced by multimodal fusion on low-quality data.
This new taxonomy will enable researchers to understand the state of the field and identify several potential directions.
arXiv Detail & Related papers (2024-04-27T07:22:28Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Multimodal Object Detection in Remote Sensing [2.8698937226234795]
We present a comparison of methods for multimodal object detection in remote sensing.
We survey available multimodal datasets suitable for evaluation, and discuss future directions.
arXiv Detail & Related papers (2023-07-13T12:37:14Z) - Read, Look or Listen? What's Needed for Solving a Multimodal Dataset [7.0430001782867]
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality.
We analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification.
arXiv Detail & Related papers (2023-07-06T08:02:45Z) - Informative Data Selection with Uncertainty for Multi-modal Object
Detection [25.602915381482468]
We propose a universal uncertainty-aware multi-modal fusion model.
Our model reduces the randomness in fusion and generates reliable output.
Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation.
arXiv Detail & Related papers (2023-04-23T16:36:13Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.