Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models
- URL: http://arxiv.org/abs/2511.15743v1
- Date: Tue, 18 Nov 2025 20:13:25 GMT
- Title: Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models
- Authors: Linnea M. Wolniewicz, Halil S. Kelebek, Simone Mestici, Michael D. Vergalla, Giacomo Acciarini, Bala Poduval, Olga Verkhoglyadova, Madhulika Guhathakurta, Thomas E. Berger, Atılım Güneş Baydin, Frank Soboczenski,
- Abstract summary: We present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure.<n>Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC)
- Score: 0.27300286905606946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, accurate predictions that support Global Navigation Satellite System (GNSS), communications, aviation safety, as well as satellite operations. As part of the 2025 NASA Heliolab, we present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC). We also implement geospatially sparse data such as the TEC derived from the World-Wide GNSS Receiver Network and crowdsourced Android smartphone measurements. This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling. Leveraging this dataset, we train and benchmark several spatiotemporal machine learning architectures for forecasting vertical TEC under both quiet and geomagnetically active conditions. This work presents an extensive dataset and modeling pipeline that enables exploration of not only ionospheric dynamics but also broader Sun-Earth interactions, supporting both scientific inquiry and operational forecasting efforts.
Related papers
- DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space [60.729377189859]
We propose our DAWP framework to enable AIWPs to operate in a complete observation space.<n>AIDA module applies a mask multi-modality autoencoder for assimilating irregular satellite observation tokens.<n>We show that AIDA significantly improves the roll out and efficiency of AIWP and holds promising potential to be applied in global precipitationresolution forecasting.
arXiv Detail & Related papers (2025-10-13T03:13:35Z) - Forecasting the Ionosphere from Sparse GNSS Data with Temporal-Fusion Transformers [0.28112829609955153]
Total Electron Content (TEC) is a key ionospheric parameter.<n>TEC is derived from observations, but its reliable forecasting is limited by the sparse nature of global measurements.<n>We present a machine learning framework for ionospheric TEC forecasting that leverages Temporal Fusion Transformers (TFT) to predict sparse ionosphere data.
arXiv Detail & Related papers (2025-08-30T23:08:19Z) - SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction [2.288747975391298]
This paper introduces a high resolution, machine learning-ready heliophysics dataset derived from NASA's Solar Dynamics Observatory (SDO)<n>The dataset includes processed imagery from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI)<n>To ensure suitability for ML tasks, the data has been preprocessed, including correction of spacecraft roll angles, orbital adjustments, exposure normalization, and degradation compensation.
arXiv Detail & Related papers (2025-08-18T00:05:01Z) - Conceptual framework for the application of deep neural networks to surface composition reconstruction from Mercury's exospheric data [77.40388962445168]
This study explores the feasibility of deriving Mercury's regolith elemental composition from in-situ measurements of its neutral exosphere using deep neural networks (DNNs)<n>We present a supervised feed-forward DNN architecture that predicts the chemical elements of the surface regolith below.<n>It serves as an estimator for the surface-exosphere interaction and the processes leading to exosphere formation.
arXiv Detail & Related papers (2025-05-16T09:52:45Z) - OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.<n>We propose a MLLM (OmniGeo) tailored to geospatial applications.<n>By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z) - A Machine Learning-Ready Data Processing Tool for Near Real-Time Forecasting [0.0]
This paper presents the development of a Machine Learning (ML)- ready data processing tool for Near Real-Time (NRT) space weather forecasting.<n>By merging data from diverse NRT sources, the tool addresses key gaps in current space weather prediction capabilities.<n>The tool processes and structures the data for machine learning models, focusing on time-series forecasting and event detection for extreme solar events.
arXiv Detail & Related papers (2025-02-12T16:35:46Z) - EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z) - A Foundation Model for the Solar Dynamics Observatory [2.63089646549647]
SDO-FM is a foundation model using data from NASA's Solar Dynamics Observatory (SDO) spacecraft.
This paper marks release of our pretrained models and embedding datasets, available to the community on Hugging Face and sdofm.org.
arXiv Detail & Related papers (2024-10-03T14:36:32Z) - Observation-Guided Meteorological Field Downscaling at Station Scale: A
Benchmark and a New Method [66.80344502790231]
We extend meteorological downscaling to arbitrary scattered station scales and establish a new benchmark and dataset.
Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors.
Our proposed method outperforms other specially designed baseline models on multiple surface variables.
arXiv Detail & Related papers (2024-01-22T14:02:56Z) - Federated Prompt Learning for Weather Foundation Models on Devices [37.88417074427373]
On-device intelligence for weather forecasting uses local deep learning models to analyze weather patterns without centralized cloud computing.
This paper propose Federated Prompt Learning for Weather Foundation Models on Devices (FedPoD)
FedPoD enables devices to obtain highly customized models while maintaining communication efficiency.
arXiv Detail & Related papers (2023-05-23T16:59:20Z) - Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol
Particles for Frontier Exploration [55.41644538483948]
This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles.
It contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format.
The focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data.
arXiv Detail & Related papers (2023-04-27T20:21:18Z) - Earthformer: Exploring Space-Time Transformers for Earth System
Forecasting [27.60569643222878]
We propose Earthformer, a space-time Transformer for Earth system forecasting.
The Transformer is based on a generic, flexible and efficient space-time attention block, named Cuboid Attention.
Experiments on two real-world benchmarks about precipitation nowcasting and El Nino/Southerntemporaltion show Earthformer achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-07-12T20:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.