Advances and Challenges in Multimodal Remote Sensing Image Registration
- URL: http://arxiv.org/abs/2302.00912v2
- Date: Sun, 5 Feb 2023 07:59:59 GMT
- Title: Advances and Challenges in Multimodal Remote Sensing Image Registration
- Authors: Bai Zhu, Liang Zhou, Simiao Pu, Jianwei Fan, Yuanxin Ye
- Abstract summary: multimodal remote sensing image registration is a crucial step for integrating the complementary information among multimodal data and making comprehensive observations and analysis of the Earths surface.
We will present our own contributions to the field of multimodal image registration, summarize the advantages and limitations of existing multimodal image registration methods, and then discuss the remaining challenges.
- Score: 4.212031520823376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past few decades, with the rapid development of global aerospace and
aerial remote sensing technology, the types of sensors have evolved from the
traditional monomodal sensors (e.g., optical sensors) to the new generation of
multimodal sensors [e.g., multispectral, hyperspectral, light detection and
ranging (LiDAR) and synthetic aperture radar (SAR) sensors]. These advanced
devices can dynamically provide various and abundant multimodal remote sensing
images with different spatial, temporal, and spectral resolutions according to
different application requirements. Since then, it is of great scientific
significance to carry out the research of multimodal remote sensing image
registration, which is a crucial step for integrating the complementary
information among multimodal data and making comprehensive observations and
analysis of the Earths surface. In this work, we will present our own
contributions to the field of multimodal image registration, summarize the
advantages and limitations of existing multimodal image registration methods,
and then discuss the remaining challenges and make a forward-looking prospect
for the future development of the field.
Related papers
- Motion Capture from Inertial and Vision Sensors [60.5190090684795]
MINIONS is a large-scale Motion capture dataset collected from INertial and visION Sensors.
We conduct experiments on multi-modal motion capture using a monocular camera and very few IMUs.
arXiv Detail & Related papers (2024-07-23T09:41:10Z) - Bridging Remote Sensors with Multisensor Geospatial Foundation Models [15.289711240431107]
msGFM is a multisensor geospatial foundation model that unifies data from four key sensor modalities.
For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach.
msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks.
arXiv Detail & Related papers (2024-04-01T17:30:56Z) - LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition.
Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object
Detection [0.0]
We propose HRFuser, a modular architecture for multi-modal 2D object detection.
It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities.
We demonstrate via experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities.
arXiv Detail & Related papers (2022-06-30T09:40:05Z) - Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics.
We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Multimodal Image Synthesis and Editing: The Generative AI Era [131.9569600472503]
multimodal image synthesis and editing has become a hot research topic in recent years.
We comprehensively contextualize the advance of the recent multimodal image synthesis and editing.
We describe benchmark datasets and evaluation metrics as well as corresponding experimental results.
arXiv Detail & Related papers (2021-12-27T10:00:16Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Inertial Sensor Data To Image Encoding For Human Action Recognition [0.0]
Convolutional Neural Networks (CNNs) are successful deep learning models in the field of computer vision.
In this paper, we use 4 types of spatial domain methods for transforming inertial sensor data to activity images.
For creating a multimodal fusion framework, we made each type of activity images multimodal by convolving with two spatial domain filters.
arXiv Detail & Related papers (2021-05-28T01:22:52Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.