Related papers: Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data

Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data

URL: http://arxiv.org/abs/2311.11777v1
Date: Mon, 20 Nov 2023 14:02:50 GMT
Title: Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data
Authors: Man Chen, Wenquan Dong, Hao Yu, Iain Woodhouse, Casey M. Ryan, Haoyu Liu, Selena Georgiou, Edward T.A. Mitchard
Abstract summary: We propose a novel deep learning framework termed the multi-modal attention remote sensing network (MARSNet) to estimate forest dominant height. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high resolution dominant height estimation.
Score: 5.309673841813994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of multisource remote sensing data and deep learning models offers new possibilities for accurately mapping high spatial resolution forest height. We found that GEDI relative heights (RH) metrics exhibited strong correlation with the mean of the top 10 highest trees (dominant height) measured in situ at the corresponding footprint locations. Consequently, we proposed a novel deep learning framework termed the multi-modal attention remote sensing network (MARSNet) to estimate forest dominant height by extrapolating dominant height derived from GEDI, using Setinel-1 data, ALOS-2 PALSAR-2 data, Sentinel-2 optical data and ancillary data. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Using individual encoders for each remote sensing imagery avoids interference across modalities and extracts distinct representations. To focus on the efficacious information from each dataset, we reduced the prevalent spatial and band redundancies in each remote sensing data by incorporating the extended spatial and band reconstruction convolution modules in the encoders. MARSNet achieved commendable performance in estimating dominant height, with an R2 of 0.62 and RMSE of 2.82 m, outperforming the widely used random forest approach which attained an R2 of 0.55 and RMSE of 3.05 m. Finally, we applied the trained MARSNet model to generate wall-to-wall maps at 10 m resolution for Jilin, China. Through independent validation using field measurements, MARSNet demonstrated an R2 of 0.58 and RMSE of 3.76 m, compared to 0.41 and 4.37 m for the random forest baseline. Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high resolution dominant height estimation.

Related papers

Inferring Height from Earth Embeddings: First insights using Google AlphaEarth [0.0]
This study investigates whether the geospatial and multimodal features encoded in textitEarth Embeddings can effectively guide deep learning (DL) regression models for regional surface height mapping.<n>We focused on AlphaEarth Embeddings at 10 m spatial resolution and evaluated their capability to support height inference using a high-quality Digital Surface Model (DSM) as reference.<n>Both architectures achieved strong training performance (both with $R2 = 0.97$), confirming that the embeddings encode informative and decodable height-related signals.
arXiv Detail & Related papers (2026-02-19T10:52:50Z)
SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping [3.8902217877872034]
High-resolution mapping of canopy height is essential for forest management and biodiversity monitoring.<n>We present SERA-H, an end-to-end model combining a super-resolution module and temporal attention encoding.<n>Our model generates 2.5 m resolution height maps from freely available Sentinel-1 and Sentinel-2 time series data.
arXiv Detail & Related papers (2025-12-19T23:23:14Z)
Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using LiDAR HD Reference Data across Metropolitan France [0.9351726364879229]
We introduce THREASURE-Net, a novel end-to-end framework for Tree Height Regression And Super-Resolution.<n>The model is trained on Sentinel-2 time series using reference height metrics derived from LiDAR HD data.<n>We evaluate three model variants, producing tree-height predictions at 2.5 m, 5 m, and 10 m resolution.
arXiv Detail & Related papers (2025-12-12T12:49:16Z)
TransBridge: Boost 3D Object Detection by Scene-Level Completion with Transformer Decoder [66.22997415145467]
This paper presents a joint completion and detection framework that improves the detection feature in sparse areas.<n> Specifically, we propose TransBridge, a novel transformer-based up-sampling block that fuses the features from the detection and completion networks.<n>The results show that our framework consistently improves end-to-end 3D object detection, with the mean average precision (mAP) ranging from 0.7 to 1.5 across multiple methods.
arXiv Detail & Related papers (2025-12-12T00:08:03Z)
Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective [54.91271106816616]
Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency.<n>We propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives.<n> Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps.<n>For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities.<n>For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework.
arXiv Detail & Related papers (2025-05-07T19:37:20Z)
OPAL: Visibility-aware LiDAR-to-OpenStreetMap Place Recognition via Adaptive Radial Fusion [33.87605068407066]
OPAL is a novel network for LiDAR place recognition that leverages OpenStreetMap (OSM) as a lightweight and up-to-date prior. Our key innovation lies in bridging the domain disparity between sparse LiDAR scans and structured OSM data through two carefully designed components. Experiments on the KITTI and KITTI-360 datasets demonstrate OPAL's superiority, achieving 15.98% higher recall at @1m threshold for top-1 retrieved matches.
arXiv Detail & Related papers (2025-04-27T14:39:26Z)
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection [53.2590751089607]
Real-IAD D3 is a high-precision multimodal dataset that incorporates an additional pseudo3D modality generated through photometric stereo. We introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality. Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance.
arXiv Detail & Related papers (2025-04-19T08:05:47Z)
Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection [10.353412441955436]
We propose the GL-DMNet, a novel dual mutual learning network with global-local awareness. We present a position mutual fusion module and a channel mutual fusion module to exploit the interdependencies among different modalities. Our proposed GL-DMNet performs better than 24 RGB-D SOD methods, achieving an average improvement of 3%.
arXiv Detail & Related papers (2025-01-03T05:37:54Z)
LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing [25.016421338677816]
Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. We propose a novel textbfLightweight textbfMultimodal data textbfFusion textbfNetwork (LMFNet) LMFNet accommodates various data types simultaneously, including RGB, NirRG, and DSM, through a weight-sharing, multi-branch vision transformer.
arXiv Detail & Related papers (2024-04-21T13:29:42Z)
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z)
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning. We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z)
Geo2SigMap: High-Fidelity RF Signal Mapping Using Geographic Databases [10.278799374600919]
Geo2SigMap is an ML-based framework for efficient and high-fidelity RF signal mapping using geographic databases. We develop an automated framework that seamlessly integrates three open-source tools: OpenStreetMap, Blender, and Sionna. Our results show that Geo2SigMap achieves an average root-mean-square-error (RMSE) of 6.04 dB for predicting the reference signal received power (RSRP) at the UE.
arXiv Detail & Related papers (2023-12-21T21:26:09Z)
Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights [50.52704854147297]
We present a new vision transformer (ViT) model optimized with a classification (discrete) and a continuous loss function. This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function.
arXiv Detail & Related papers (2023-04-22T22:39:03Z)
High-resolution canopy height map in the Landes forest (France) based on GEDI, Sentinel-1, and Sentinel-2 data with a deep learning approach [0.044381279572631216]
We develop a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.
arXiv Detail & Related papers (2022-12-20T14:14:37Z)
Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules. With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z)
Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network. We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs. Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z)
DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution Remote Sensing Images [19.07341794770722]
An enhanced deep neural network model termed Dual-Decoder-U-Net (DDU-Net) is proposed in this paper. The proposed model outperforms the state-of-the-art DenseUNet, DeepLabv3+ and D-LinkNet by 6.5%, 3.3%, and 2.1% in the mean Intersection over Union (mIoU) and by 4%, 4.8%, and 3.1% in the F1 score, respectively.
arXiv Detail & Related papers (2022-01-18T05:27:49Z)
An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery [26.362854938949923]
We propose a novel convolutional neural network architecture, named attention-fused network (AFNet) We achieve state-of-the-art performance with an overall accuracy of 91.7% and a mean F1 score of 90.96% on the ISPRS Vaihingen 2D dataset and the ISPRS Potsdam 2D dataset.
arXiv Detail & Related papers (2021-05-10T06:23:27Z)
OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras. For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation. We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.