Climate Knowledge in Large Language Models
- URL: http://arxiv.org/abs/2510.08043v1
- Date: Thu, 09 Oct 2025 10:25:36 GMT
- Title: Climate Knowledge in Large Language Models
- Authors: Ivan Kuznetsov, Jacopo Grassi, Dmitrii Pantiukhin, Boris Shapkin, Thomas Jung, Nikolay Koldunov,
- Abstract summary: This study investigates the capacity of large language models to recall climate normals without external retrieval.<n>We construct a global grid of queries at 1deg resolution land points, providing coordinates and location descriptors, and validate responses against ERA5 reanalysis.<n>Results show that LLMs encode non-trivial climate structure, capturing latitudinal and topographic patterns, with root-mean-square errors of 3-6 degC and biases of $pm$1 degC.<n>We find that including geographic context reduces errors by 27% on average, with larger models being most sensitive to location descriptors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly deployed for climate-related applications, where understanding internal climatological knowledge is crucial for reliability and misinformation risk assessment. Despite growing adoption, the capacity of LLMs to recall climate normals from parametric knowledge remains largely uncharacterized. We investigate the capacity of contemporary LLMs to recall climate normals without external retrieval, focusing on a prototypical query: mean July 2-m air temperature 1991-2020 at specified locations. We construct a global grid of queries at 1{\deg} resolution land points, providing coordinates and location descriptors, and validate responses against ERA5 reanalysis. Results show that LLMs encode non-trivial climate structure, capturing latitudinal and topographic patterns, with root-mean-square errors of 3-6 {\deg}C and biases of $\pm$1 {\deg}C. However, spatially coherent errors remain, particularly in mountains and high latitudes. Performance degrades sharply above 1500 m, where RMSE reaches 5-13 {\deg}C compared to 2-4 {\deg}C at lower elevations. We find that including geographic context (country, city, region) reduces errors by 27% on average, with larger models being most sensitive to location descriptors. While models capture the global mean magnitude of observed warming between 1950-1974 and 2000-2024, they fail to reproduce spatial patterns of temperature change, which directly relate to assessing climate change. This limitation highlights that while LLMs may capture present-day climate distributions, they struggle to represent the regional and local expression of long-term shifts in temperature essential for understanding climate dynamics. Our evaluation framework provides a reproducible benchmark for quantifying parametric climate knowledge in LLMs and complements existing climate communication assessments.
Related papers
- Deep Learning-Driven Downscaling for Climate Risk Assessment of Projected Temperature Extremes in the Nordic Region [3.7957889222222208]
Rapid changes and increasing climatic variability across the Koppen-Geiger regions of northern Europe generate significant needs for adaptation.<n>This work presents an integrative downscaling framework that incorporates Vision Transformer (ViT), Convolutional Long Short-Term Memory (ConvLSTM), and Spatiotemporal Transformer with Attention and Imbalance-Aware Network (GeoStaNet) models.
arXiv Detail & Related papers (2025-11-05T17:08:32Z) - ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method [61.76389719956301]
We contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns time series climate data from ERA5, extreme weather events data from NOAA, and satellite image data from NASA.<n>Under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks.
arXiv Detail & Related papers (2025-04-10T02:22:23Z) - Towards Kriging-informed Conditional Diffusion for Regional Sea-Level Data Downscaling [3.8178633709015446]
Given coarser-resolution projections from global climate models or satellite data, the downscaling problem aims to estimate finer-resolution regional climate data.<n>This problem is societally crucial for effective adaptation, mitigation, and resilience against significant risks from climate change.<n>We propose a novel Kriging-informed Conditional Diffusion Probabilistic Model (Ki-CDPM) to capture spatial variability while preserving fine-scale features.
arXiv Detail & Related papers (2024-10-21T04:24:10Z) - MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling [68.69647625472464]
Downscaling, a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions.
Previous downscaling methods lacked tailored designs for meteorology and encountered structural limitations.
We propose a novel model called MambaDS, which enhances the utilization of multivariable correlations and topography information.
arXiv Detail & Related papers (2024-08-20T13:45:49Z) - ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis [32.940471253248965]
We introduce Sparse Position and Outline Tracking (SPOT), a novel algorithm designed to process irregularly shaped regions in visual data.<n>SPOT identifies and localizes irregularly shaped regions by extracting their spatial coordinates, enabling structured representations of irregular shapes.<n>Building on SPOT, we construct ClimateIQA, a novel meteorological visual question answering dataset.<n>ClimateIQA enhances VLM training by incorporating spatial cues, geographic metadata, and reanalysis data, improving model accuracy in interpreting and describing extreme weather features.
arXiv Detail & Related papers (2024-06-14T08:46:44Z) - Distortions in Judged Spatial Relations in Large Language Models [45.875801135769585]
GPT-4 exhibited superior performance with 55 percent accuracy, followed by GPT-3.5 at 47 percent, and Llama-2 at 45 percent.
The models identified the nearest cardinal direction in most cases, reflecting their associative learning mechanism.
arXiv Detail & Related papers (2024-01-08T20:08:04Z) - ClimateX: Do LLMs Accurately Assess Human Expert Confidence in Climate
Statements? [0.0]
We introduce the Expert Confidence in Climate Statements (ClimateX) dataset, a novel, curated, expert-labeled dataset consisting of 8094 climate statements.
Using this dataset, we show that recent Large Language Models (LLMs) can classify human expert confidence in climate-related statements.
Overall, models exhibit consistent and significant over-confidence on low and medium confidence statements.
arXiv Detail & Related papers (2023-11-28T10:26:57Z) - Transferring climate change physical knowledge [13.529445977186635]
We show that Machine Learning can be used to optimally leverage and merge the knowledge gained from global temperature maps simulated by Earth system models.<n>We reach an uncertainty reduction of more than 50% with respect to state-of-the-art approaches.
arXiv Detail & Related papers (2023-09-26T09:24:53Z) - Multi-variable Hard Physical Constraints for Climate Model Downscaling [17.402215838651557]
Global Climate Models (GCMs) are the primary tool to simulate climate evolution and assess the impacts of climate change.
They often operate at a coarse spatial resolution that limits their accuracy in reproducing local-scale phenomena.
This study investigates the scope of this problem and, through an application on temperature, lays the foundation for a framework introducing multi-variable hard constraints.
arXiv Detail & Related papers (2023-08-02T11:42:02Z) - Multi-scale Digital Twin: Developing a fast and physics-informed
surrogate model for groundwater contamination with uncertain climate models [53.44486283038738]
Climate change exacerbates the long-term soil management problem of groundwater contamination.
We develop a physics-informed machine learning surrogate model using U-Net enhanced Fourier Neural Contaminated (PDENO)
In parallel, we develop a convolutional autoencoder combined with climate data to reduce the dimensionality of climatic region similarities across the United States.
arXiv Detail & Related papers (2022-11-20T06:46:35Z) - Spatiotemporal modeling of European paleoclimate using doubly sparse
Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden.
We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z) - A Multi-Scale Deep Learning Framework for Projecting Weather Extremes [3.3598755777055374]
Weather extremes are a major societal and economic hazard, claiming thousands of lives and causing billions of dollars in damage every year.
General circulation models (GCMs), which are currently the primary tool for climate projections, cannot characterize weather extremes accurately.
We present a multi-resolution deep-learning framework that corrects a GCM's biases by matching low-order and tail statistics of its output with observations at coarse scales.
We use the proposed framework to generate statistically realistic realizations of the climate over Western Europe from a simple GCM corrected using observational atmospheric reanalysis.
arXiv Detail & Related papers (2022-10-21T17:47:05Z) - Dynamical Landscape and Multistability of a Climate Model [64.467612647225]
We find a third intermediate stable state in one of the two climate models we consider.
The combination of our approaches allows to identify how the negative feedback of ocean heat transport and entropy production drastically change the topography of Earth's climate.
arXiv Detail & Related papers (2020-10-20T15:31:38Z) - A generative adversarial network approach to (ensemble) weather
prediction [91.3755431537592]
We use a conditional deep convolutional generative adversarial network to predict the geopotential height of the 500 hPa pressure level, the two-meter temperature and the total precipitation for the next 24 hours over Europe.
The proposed models are trained on 4 years of ERA5 reanalysis data from 2015-2018 with the goal to predict the associated meteorological fields in 2019.
arXiv Detail & Related papers (2020-06-13T20:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.