Inferring 3D change detection from bitemporal optical images
- URL: http://arxiv.org/abs/2205.15903v1
- Date: Tue, 31 May 2022 15:53:33 GMT
- Title: Inferring 3D change detection from bitemporal optical images
- Authors: Valerio Marsocci, Virginia Coletta, Roberta Ravanelli, Simone
Scardapane, Mattia Crespi
- Abstract summary: We propose two novel networks, able to solve simultaneously the 2D and 3D CD tasks.
The aim of this work is to lay the foundations for the development of DL algorithms able to automatically infer an elevation (3D) CD map.
The code and the 3DCD dataset are available at urlhttps://sites.google.com/uniroma1.it/3dchangedetection/home-page.
- Score: 6.050310428775564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Change detection is one of the most active research areas in Remote Sensing
(RS). Most of the recently developed change detection methods are based on deep
learning (DL) algorithms. This kind of algorithms is generally focused on
generating two-dimensional (2D) change maps, thus only identifying planimetric
changes in land use/land cover (LULC) and not considering nor returning any
information on the corresponding elevation changes. Our work goes one step
further, proposing two novel networks, able to solve simultaneously the 2D and
3D CD tasks, and the 3DCD dataset, a novel and freely available dataset
precisely designed for this multitask. Particularly, the aim of this work is to
lay the foundations for the development of DL algorithms able to automatically
infer an elevation (3D) CD map -- together with a standard 2D CD map --,
starting only from a pair of bitemporal optical images. The proposed
architectures, to perform the task described before, consist of a
transformer-based network, the MultiTask Bitemporal Images Transformer (MTBIT),
and a deep convolutional network, the Siamese ResUNet (SUNet). Particularly,
MTBIT is a transformer-based architecture, based on a semantic tokenizer. SUNet
instead combines, in a siamese encoder, skip connections and residual layers to
learn rich features, capable to solve efficiently the proposed task. These
models are, thus, able to obtain 3D CD maps from two optical images taken at
different time instants, without the need to rely directly on elevation data
during the inference step. Encouraging results, obtained on the novel 3DCD
dataset, are shown. The code and the 3DCD dataset are available at
\url{https://sites.google.com/uniroma1.it/3dchangedetection/home-page}.
Related papers
- 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement [2.2122801766964795]
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes.
Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times.
Our method can accurately identify changes in cluttered environments using sparse (as few as one) post-change images within as little as 18s.
arXiv Detail & Related papers (2024-11-06T07:08:41Z) - ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images [19.02348585677397]
Open-vocabulary 3D object detection (OV-3Det) aims to generalize beyond the limited number of base categories labeled during the training phase.
The biggest bottleneck is the scarcity of annotated 3D data, whereas 2D image datasets are abundant and richly annotated.
We propose a novel framework ImOV3D to leverage pseudo multimodal representation containing both images and point clouds (PC) to close the modality gap.
arXiv Detail & Related papers (2024-10-31T15:02:05Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Spatiotemporal Modeling Encounters 3D Medical Image Analysis:
Slice-Shift UNet with Multi-View Fusion [0.0]
We propose a new 2D-based model dubbed Slice SHift UNet which encodes three-dimensional features at 2D CNN's complexity.
More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three planes of a volume.
The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ axis (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets.
arXiv Detail & Related papers (2023-07-24T14:53:23Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.