AgentCaster: Reasoning-Guided Tornado Forecasting
- URL: http://arxiv.org/abs/2510.03349v1
- Date: Thu, 02 Oct 2025 17:57:16 GMT
- Title: AgentCaster: Reasoning-Guided Tornado Forecasting
- Authors: Michael Chen,
- Abstract summary: AgentCaster is a framework to evaluate Large Language Models (LLMs) on complex, real-world tasks.<n>We assess model performance over a 40-day period featuring diverse historical data, spanning several major tornado outbreaks and including over 500 tornado reports.<n>Human experts significantly outperform state-of-the-art models, which demonstrate a strong tendency to hallucinate and overpredict risk intensity, struggle with precise geographic placement, and exhibit poortemporal reasoning in complex, dynamically evolving systems.
- Score: 2.8271273825420606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing need to evaluate Large Language Models (LLMs) on complex, high-impact, real-world tasks to assess their true readiness as reasoning agents. To address this gap, we introduce AgentCaster, a contamination-free framework employing multimodal LLMs end-to-end for the challenging, long-horizon task of tornado forecasting. Within AgentCaster, models interpret heterogeneous spatiotemporal data from a high-resolution convection-allowing forecast archive. We assess model performance over a 40-day period featuring diverse historical data, spanning several major tornado outbreaks and including over 500 tornado reports. Each day, models query interactively from a pool of 3,625 forecast maps and 40,125 forecast soundings for a forecast horizon of 12-36 hours. Probabilistic tornado-risk polygon predictions are verified against ground truths derived from geometric comparisons across disjoint risk bands in projected coordinate space. To quantify accuracy, we propose domain-specific TornadoBench and TornadoHallucination metrics, with TornadoBench highly challenging for both LLMs and domain expert human forecasters. Notably, human experts significantly outperform state-of-the-art models, which demonstrate a strong tendency to hallucinate and overpredict risk intensity, struggle with precise geographic placement, and exhibit poor spatiotemporal reasoning in complex, dynamically evolving systems. AgentCaster aims to advance research on improving LLM agents for challenging reasoning tasks in critical domains.
Related papers
- Agentic Spatio-Temporal Grounding via Collaborative Reasoning [80.83158605034465]
Temporal Video Grounding aims to retrieve thetemporal tube of a target object or person in a video given a text query.<n>We propose the Agentic Spatio-Temporal Grounder (ASTG) framework for the task of STVG towards an open-world and training-free scenario.<n>Specifically, two specialized agents SRA (Spatial Reasoning Agent) and TRA (Temporal Reasoning Agent) constructed leveraging on modern Multimoal Large Language Models (MLLMs)<n>Experiments on popular benchmarks demonstrate the superiority of the proposed approach where it outperforms existing weakly-supervised and zero-shot approaches by a margin
arXiv Detail & Related papers (2026-02-10T10:16:27Z) - Forecasting Fails: Unveiling Evasion Attacks in Weather Prediction Models [60.728124907335]
This work introduces Weather Adaptive Adversarial Perturbation Optimization (WAAPO), a novel framework for generating targeted adversarial perturbations.<n>WAAPO achieves this by incorporating constraints for channel sparsity, spatial localization, and smoothness, ensuring that perturbations remain physically realistic and imperceptible.<n>Our experiments highlight critical vulnerabilities in AI-driven forecasting models, where small perturbations to initial conditions can result in significant deviations.
arXiv Detail & Related papers (2025-12-09T17:20:56Z) - Agentic AI Framework for Cloudburst Prediction and Coordinated Response [0.8697317909540486]
The paper outlines an agentic artificial intelligence system to study atmospheric water-cycle intelligence.<n>The framework uses autonomous but cooperative agents that reason, sense, and act throughout the entire event lifecycle.<n>It provides a platform of scalable adaptive and learning-based climate resilience.
arXiv Detail & Related papers (2025-11-27T21:33:03Z) - SimCast: Enhancing Precipitation Nowcasting with Short-to-Long Term Knowledge Distillation [15.244330283621247]
Accurate nowcasting is of utmost importance for addressing various societal needs, including disaster management, agriculture, transportation, and energy optimization.<n>We propose SimCast, a novel training pipeline featuring a short-to-long term knowledge distillation technique coupled with a weighted MSE loss to prioritize heavy rainfall regions.<n>As SimCast generates deterministic predictions, we further integrate it into a diffusion-based framework named CasCast, leveraging the strengths from probabilistic models to overcome limitations such as blurriness and distribution shift in deterministic outputs.
arXiv Detail & Related papers (2025-10-09T08:49:16Z) - Inferring Thunderstorm Occurrence from Vertical Profiles of Convection-Permitting Simulations: Physical Insights from a Physical Deep Learning Model [0.0]
Thunderstorms have significant social and economic impacts due to heavy precipitation, hail, lightning, and strong winds.<n>We develop SALAMA 1D, a deep neural network which directly infers the probability of thunderstorm occurrence from vertical profiles of ten atmospheric variables.
arXiv Detail & Related papers (2024-09-30T08:40:28Z) - Generating Fine-Grained Causality in Climate Time Series Data for Forecasting and Anomaly Detection [67.40407388422514]
We design a conceptual fine-grained causal model named TBN Granger Causality.
Second, we propose an end-to-end deep generative model called TacSas, which discovers TBN Granger Causality in a generative manner.
We test TacSas on climate benchmark ERA5 for climate forecasting and the extreme weather benchmark of NOAA for extreme weather alerts.
arXiv Detail & Related papers (2024-08-08T06:47:21Z) - A Novel Hybrid Approach for Tornado Prediction in the United States: Kalman-Convolutional BiLSTM with Multi-Head Attention [9.51657235413336]
Tornadoes are among the most intense atmospheric vortex phenomena and pose significant challenges for detection and forecasting.
Conventional methods, which heavily depend on ground-based observations and radar data, are limited by issues such as decreased accuracy over greater distances and a high rate of false positives.
This study utilizes the Seamless Hybrid Scan Reflectivity dataset from the Multi-Radar Multi-Sensor (MRMS) system to enhance accuracy.
A novel hybrid model, the Kalman-Convolutional BiLSTM with Multi-Head Attention, is introduced to improve dynamic state estimation and capture both spatial and temporal dependencies within the data.
arXiv Detail & Related papers (2024-08-05T18:11:23Z) - WeatherQA: Can Multimodal Language Models Reason about Severe Weather? [45.43764278625153]
Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year.
This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas.
We introduce WeatherQA, the first multimodal dataset designed for machines to reason about complex combinations of weather parameters and predict severe weather in real-world scenarios.
arXiv Detail & Related papers (2024-06-17T05:23:18Z) - Lightning-Fast Convective Outlooks: Predicting Severe Convective Environments with Global AI-based Weather Models [0.08271752505511926]
Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts.
Recently released suite of AI-based weather models produces medium-range forecasts within seconds.
We assess the forecast skill of three top-performing AI-models for convective parameters against reanalysis and ECMWF's operational numerical weather prediction model IFS.
arXiv Detail & Related papers (2024-06-13T07:46:03Z) - Learning Robust Precipitation Forecaster by Temporal Frame Interpolation [65.5045412005064]
We develop a robust precipitation forecasting model that demonstrates resilience against spatial-temporal discrepancies.
Our approach has led to significant improvements in forecasting precision, culminating in our model securing textit1st place in the transfer learning leaderboard of the textitWeather4cast'23 competition.
arXiv Detail & Related papers (2023-11-30T08:22:08Z) - Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs.
Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative.
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z) - Long-term drought prediction using deep neural networks based on geospatial weather data [75.38539438000072]
High-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance.
We tackle drought data by introducing an end-to-end approach that adopts a systematic end-to-end approach.
Key findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts.
arXiv Detail & Related papers (2023-09-12T13:28:06Z) - An evaluation of deep learning models for predicting water depth
evolution in urban floods [59.31940764426359]
We compare different deep learning models for prediction of water depth at high spatial resolution.
Deep learning models are trained to reproduce the data simulated by the CADDIES cellular-automata flood model.
Our results show that the deep learning models present in general lower errors compared to the other methods.
arXiv Detail & Related papers (2023-02-20T16:08:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.