HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images
- URL: http://arxiv.org/abs/2602.20066v1
- Date: Mon, 23 Feb 2026 17:22:54 GMT
- Title: HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images
- Authors: Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer,
- Abstract summary: HeatPrompt is a zero-shot vision-language energy modeling framework.<n>It estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS) and building-level features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.
Related papers
- ThermEval: A Structured Benchmark for Evaluation of Vision-Language Models on Thermal Imagery [11.547362584832769]
Vision language models (VLMs) achieve strong performance on RGB imagery, but they do not generalize to thermal images.<n> thermal sensing plays a critical role in settings where visible light fails, including nighttime surveillance, search and rescue, autonomous driving, and medical screening.<n>We introduce ThermEval-B, a benchmark to assess the foundational primitives required for thermal vision language understanding.
arXiv Detail & Related papers (2026-02-16T18:16:19Z) - Hot Hém: Sài Gòn Giũa Cái Nóng Hông Còng Bàng -- Saigon in Unequal Heat [0.0]
Hot Hém is a GeoAI workflow that estimates pedestrian heat exposure in H Ch Minh City (HCMC), Videt Nam, colloquially known as Si Gn.<n>This spatial data science pipeline combines Google Street View (GSV) imagery, semantic image segmentation, and remote sensing.<n>Two XGBoost models are trained to predict land surface temperature (LST) using a GSV training dataset in selected administrative wards, known as phng, and are deployed in a patchwork manner across all OSMnx-derived pedestrian network nodes to enable heat-aware routing.
arXiv Detail & Related papers (2025-12-10T05:10:09Z) - DescribeEarth: Describe Anything for Remote Sensing Images [56.04533626223295]
We propose Geo-DLC, a novel task of object-level fine-grained image captioning for remote sensing.<n>To support this task, we construct DE-Dataset, a large-scale dataset with detailed descriptions of object attributes, relationships, and contexts.<n>We also present DescribeEarth, a Multi-modal Large Language Model architecture explicitly designed for Geo-DLC.
arXiv Detail & Related papers (2025-09-30T01:53:34Z) - RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model [59.37279559684668]
We introduce RS-vHeat, an efficient multi-modal remote sensing foundation model.<n>Specifically, RS-vHeat applies the Heat Conduction Operator (HCO) with a complexity of $O(N1.5)$ and a global receptive field.<n>Compared to attention-based remote sensing foundation models, we reduce memory usage by 84%, FLOPs by 24% and improves throughput by 2.7 times.
arXiv Detail & Related papers (2024-11-27T01:43:38Z) - ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis [32.940471253248965]
We introduce Sparse Position and Outline Tracking (SPOT), a novel algorithm designed to process irregularly shaped regions in visual data.<n>SPOT identifies and localizes irregularly shaped regions by extracting their spatial coordinates, enabling structured representations of irregular shapes.<n>Building on SPOT, we construct ClimateIQA, a novel meteorological visual question answering dataset.<n>ClimateIQA enhances VLM training by incorporating spatial cues, geographic metadata, and reanalysis data, improving model accuracy in interpreting and describing extreme weather features.
arXiv Detail & Related papers (2024-06-14T08:46:44Z) - Building Vision Models upon Heat Conduction [66.1594989193046]
This study introduces the Heat Conduction Operator (HCO) built upon the physical heat conduction principle.<n>HCO conceptualizes image patches as heat sources and models their correlations through adaptive thermal energy diffusion.<n> vHeat achieves up to a 3x throughput, 80% less GPU memory allocation, and 35% fewer computational FLOPs compared to the Swin-Transformer.
arXiv Detail & Related papers (2024-05-26T12:58:04Z) - Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots [8.265009823753982]
We present a new method to automatically generate semantic segmentation annotations for thermal imagery captured from an aerial vehicle.
This new capability overcomes the challenge of developing thermal semantic perception algorithms for field robots.
Our approach can produce highly-precise semantic segmentation labels using low-resolution satellite land cover data for little-to-no cost.
arXiv Detail & Related papers (2024-03-21T00:59:35Z) - Semantic segmentation of longitudinal thermal images for identification
of hot and cool spots in urban areas [1.124958340749622]
This work presents the analysis of semantically segmented, longitudinally, and spatially rich thermal images collected at the neighborhood scale to identify hot and cool spots in urban areas.
A subset of the thermal image dataset was used to train state-of-the-art deep learning models to segment various urban features.
arXiv Detail & Related papers (2023-10-06T13:41:39Z) - Generating Physically-Consistent Satellite Imagery for Climate Visualizations [53.61991820941501]
We train a generative adversarial network to create synthetic satellite imagery of future flooding and reforestation events.
A pure deep learning-based model can generate flood visualizations but hallucinates floods at locations that were not susceptible to flooding.
We publish our code and dataset for segmentation guided image-to-image translation in Earth observation.
arXiv Detail & Related papers (2021-04-10T15:00:15Z) - A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset [62.193924313292875]
We present the DEVCOM Army Research Laboratory Visible-Thermal Face dataset (ARL-VTF)
With over 500,000 images from 395 subjects, the ARL-VTF dataset represents to the best of our knowledge, the largest collection of paired visible and thermal face images to date.
This paper presents benchmark results and analysis on thermal face landmark detection and thermal-to-visible face verification by evaluating state-of-the-art models on the ARL-VTF dataset.
arXiv Detail & Related papers (2021-01-07T17:17:12Z) - A Transfer Learning approach to Heatmap Regression for Action Unit
intensity estimation [50.261472059743845]
Action Units (AUs) are geometrically-based atomic facial muscle movements.
We propose a novel AU modelling problem that consists of jointly estimating their localisation and intensity.
A Heatmap models whether an AU occurs or not at a given spatial location.
arXiv Detail & Related papers (2020-04-14T16:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.