Related papers: GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery

GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery

URL: http://arxiv.org/abs/2602.09932v1
Date: Tue, 10 Feb 2026 16:04:53 GMT
Title: GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery
Authors: Han Jinzhen, JinByeong Lee, JiSung Kim, MinKyung Cho, DaHee Kim, HongSik Yun,
Abstract summary: GeoFormer estimates building height and footprint on a 100 m grid using only Sentinel-1/2 imagery and open DEM data.<n> Evaluated over 54 diverse cities, GeoFormer achieves a BH RMSE of 3.19 m and a BF RMSE of 0.05, improving 7.5% and 15.3% over the strongest CNN baseline.
Score: 0.44127910213853666
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate three-dimensional urban data are critical for climate modelling, disaster risk assessment, and urban planning, yet remain scarce due to reliance on proprietary sensors or poor cross-city generalisation. We propose GeoFormer, an open-source Swin Transformer framework that jointly estimates building height (BH) and footprint (BF) on a 100 m grid using only Sentinel-1/2 imagery and open DEM data. A geo-blocked splitting strategy ensures strict spatial independence between training and test sets. Evaluated over 54 diverse cities, GeoFormer achieves a BH RMSE of 3.19 m and a BF RMSE of 0.05, improving 7.5% and 15.3% over the strongest CNN baseline, while maintaining under 3.5 m BH RMSE in cross-continent transfer. Ablation studies confirm that DEM is indispensable for height estimation and that optical reflectance dominates over SAR, though multi-source fusion yields the best overall accuracy. All code, weights, and global products are publicly released.

Related papers

Geodiffussr: Generative Terrain Texturing with Elevation Fidelity [48.82552523546255]
We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps.<n>The core mechanism is multi-scale content aggregation (MCA): DEM features are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency.<n>To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-appearance captions.
arXiv Detail & Related papers (2025-11-28T09:52:44Z)
Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales [61.03549470159347]
Vision-language models (VLMs) have advanced rapidly, yet their capacity for image-grounded geolocation in open-world conditions has not been comprehensively evaluated.<n>We present EarthWhere, a comprehensive benchmark for VLM image geolocation that evaluates visual recognition, step-by-step reasoning, and evidence use.
arXiv Detail & Related papers (2025-10-13T01:12:21Z)
Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification [9.706130801069143]
Land Cover Classification identifies land cover types on sub-meter remote imagery.<n>Most existing methods focus on 1 m imagery and rely heavily on large-scale annotations.<n>We introduce Baltimore Atlas, a generalization; land cover classification framework that reduces reliance on large-scale training data.
arXiv Detail & Related papers (2025-06-18T15:41:29Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
OpenEarthMap-SAR: A Benchmark Synthetic Aperture Radar Dataset for Global High-Resolution Land Cover Mapping [16.387666608029882]
We introduce OpenEarthMap-SAR, a benchmark SAR dataset for global high-resolution land cover mapping.<n>OpenEarthMap-SAR consists of 1.5 million segments of 5033 aerial and satellite images with the size of 1024$times$1024 pixels, covering 35 regions from Japan, France, and the USA.<n>We evaluate the performance of state-of-the-art methods for semantic segmentation and present challenging problem settings suitable for further technical development.
arXiv Detail & Related papers (2025-01-18T22:30:27Z)
Unified Deep Learning Model for Global Prediction of Aboveground Biomass, Canopy Height and Cover from High-Resolution, Multi-Sensor Satellite Imagery [0.196629787330046]
We present a new methodology which uses multi-sensor, multi-spectral imagery of 10 meters and a deep learning based model which unifies the prediction of above ground biomass density (AGBD), canopy height (CH), canopy cover (CC)<n>The model is trained on millions of globally sampled GEDI-L2/L4 measurements. We validate the capability of our model by deploying it over the entire globe for the year 2023 as well as annually from 2016 to 2023 over selected areas.
arXiv Detail & Related papers (2024-08-20T23:15:41Z)
A global product of fine-scale urban building height based on spaceborne lidar [14.651500878252723]
We provide an up-to-date global product of urban building heights based on a fine grid size of 150 m around 2020. The estimated method of building height samples based on the GEDI data was effective with 0.78 of Pearson's r and 3.67 m of RMSE. This work will boost future urban studies across many fields including climate, environmental, ecological, and social sciences.
arXiv Detail & Related papers (2023-10-22T16:51:15Z)
Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data. The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z)
Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning [74.94436509364554]
We propose a Bayesian deep learning approach to densely estimate forest structure variables at country-scale with 10-meter resolution. Our method jointly transforms Sentinel-2 optical images and Sentinel-1 synthetic aperture radar images into maps of five different forest structure variables. We train and test our model on reference data from 41 airborne laser scanning missions across Norway.
arXiv Detail & Related papers (2021-11-25T16:21:28Z)
A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery [1.9400948599830012]
Urban areas consume over two-thirds of the world's energy and account for more than 70 percent of global CO2 emissions. We propose a modified DeeplabV3+ module with a Dilated ResNet backbone to generate masks of building footprints from only three-channel RGB satellite imagery. We achieve state-of-the-art performance across three standard benchmarks and demonstrate that our method is agnostic to the scale, resolution, and urban density of satellite imagery.
arXiv Detail & Related papers (2021-04-02T22:32:04Z)
Searching Collaborative Agents for Multi-plane Localization in 3D Ultrasound [59.97366727654676]
3D ultrasound (US) is widely used due to its rich diagnostic information, portability and low cost. Standard plane (SP) localization in US volume not only improves efficiency and reduces user-dependence, but also boosts 3D US interpretation. We propose a novel Multi-Agent Reinforcement Learning framework to localize multiple uterine SPs in 3D US simultaneously.
arXiv Detail & Related papers (2020-07-30T07:23:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.