SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction
- URL: http://arxiv.org/abs/2508.14107v1
- Date: Mon, 18 Aug 2025 00:05:01 GMT
- Title: SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction
- Authors: Sujit Roy, Dinesha V. Hegde, Johannes Schmude, Amy Lin, Vishal Gaur, Rohit Lal, Kshitiz Mandal, Talwinder Singh, Andrés Muñoz-Jaramillo, Kang Yang, Chetraj Pandey, Jinsu Hong, Berkay Aydin, Ryan McGranaghan, Spiridon Kasapis, Vishal Upendran, Shah Bahauddin, Daniel da Silva, Marcus Freitag, Iksha Gurung, Nikolai Pogorelov, Campbell Watson, Manil Maskey, Juan Bernabe-Moreno, Rahul Ramachandran,
- Abstract summary: This paper introduces a high resolution, machine learning-ready heliophysics dataset derived from NASA's Solar Dynamics Observatory (SDO)<n>The dataset includes processed imagery from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI)<n>To ensure suitability for ML tasks, the data has been preprocessed, including correction of spacecraft roll angles, orbital adjustments, exposure normalization, and degradation compensation.
- Score: 2.288747975391298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a high resolution, machine learning-ready heliophysics dataset derived from NASA's Solar Dynamics Observatory (SDO), specifically designed to advance machine learning (ML) applications in solar physics and space weather forecasting. The dataset includes processed imagery from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI), spanning a solar cycle from May 2010 to July 2024. To ensure suitability for ML tasks, the data has been preprocessed, including correction of spacecraft roll angles, orbital adjustments, exposure normalization, and degradation compensation. We also provide auxiliary application benchmark datasets complementing the core SDO dataset. These provide benchmark applications for central heliophysics and space weather tasks such as active region segmentation, active region emergence forecasting, coronal field extrapolation, solar flare prediction, solar EUV spectra prediction, and solar wind speed estimation. By establishing a unified, standardized data collection, this dataset aims to facilitate benchmarking, enhance reproducibility, and accelerate the development of AI-driven models for critical space weather prediction tasks, bridging gaps between solar physics, machine learning, and operational forecasting.
Related papers
- SolARED: Solar Active Region Emergence Dataset for Machine Learning Aided Predictions [33.14810385896251]
Solar Active Region Emergence dataset (SolARED) is derived from full-disk maps of the Doppler velocity, magnetic field, and continuum intensity, obtained by the Helioseismic and Magnetic Imager (HMI) onboard the Solar Dynamics Observatory (SDO)<n>SolARED includes time series of remapped, tracked, and binned data that characterize the evolution of acoustic power of solar oscillations, unsigned magnetic flux, and intensity for 50 large ARs before, during, and after their emergence on the solar surface, as well as surrounding areas observed on the solar disc between 2010 and 2023.
arXiv Detail & Related papers (2026-01-19T15:25:18Z) - Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models [0.27300286905606946]
We present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure.<n>Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC)
arXiv Detail & Related papers (2025-11-18T20:13:25Z) - DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space [60.729377189859]
We propose our DAWP framework to enable AIWPs to operate in a complete observation space.<n>AIDA module applies a mask multi-modality autoencoder for assimilating irregular satellite observation tokens.<n>We show that AIDA significantly improves the roll out and efficiency of AIWP and holds promising potential to be applied in global precipitationresolution forecasting.
arXiv Detail & Related papers (2025-10-13T03:13:35Z) - Ultra-short-term solar power forecasting by deep learning and data reconstruction [60.200987006598524]
We propose a deep-learning based ultra-short-term solar power prediction with data reconstruction.<n>We employ deep-learning models to capture long- and short-term dependencies towards the target prediction period.
arXiv Detail & Related papers (2025-09-21T14:22:35Z) - Surya: Foundation Model for Heliophysics [3.5997539202699724]
We introduce Surya, a 366M parameter foundation model for heliophysics designed to learn general-purpose solar representations.<n>We show its ability to forecast solar dynamics and flare events, while downstream fine-tuning with parameter-efficient Low-temporal AdaptationRank (LoRA) shows strong performance.<n>Its novel architecture and performance suggest that the model is able to learn the underlying physics behind solar evolution.
arXiv Detail & Related papers (2025-08-18T05:44:25Z) - Conceptual framework for the application of deep neural networks to surface composition reconstruction from Mercury's exospheric data [77.40388962445168]
This study explores the feasibility of deriving Mercury's regolith elemental composition from in-situ measurements of its neutral exosphere using deep neural networks (DNNs)<n>We present a supervised feed-forward DNN architecture that predicts the chemical elements of the surface regolith below.<n>It serves as an estimator for the surface-exosphere interaction and the processes leading to exosphere formation.
arXiv Detail & Related papers (2025-05-16T09:52:45Z) - CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer [47.65152457550307]
We propose the geometric-inspired Circular Transformer (CirT) to model the cyclic characteristic of the graticule.<n>Experiments on the Earth Reanalysis 5 (ERA5) reanalysis dataset demonstrate our model yields a significant improvement over the advanced data-driven models.
arXiv Detail & Related papers (2025-02-27T04:26:23Z) - Solar synthetic imaging: Introducing denoising diffusion probabilistic models on SDO/AIA data [0.0]
This study proposes using generative deep learning models, specifically a Denoising Diffusion Probabilistic Model (DDPM), to create synthetic images of solar phenomena.
By employing a dataset from the AIA instrument aboard the SDO spacecraft, we aim to address the data scarcity issue.
The DDPM's performance is evaluated using cluster metrics, Frechet Inception Distance (FID), and F1-score, showcasing promising results in generating realistic solar imagery.
arXiv Detail & Related papers (2024-04-03T08:18:45Z) - Forecasting SEP Events During Solar Cycles 23 and 24 Using Interpretable
Machine Learning [38.321248253111776]
We employ a suite of machine learning strategies to evaluate the predictive potential of a new data product for a forecast of post-solar flare SEP events.
Despite the augmented volume of data, the prediction accuracy reaches 0.7 +- 0.1, which aligns with but does not exceed these published benchmarks.
arXiv Detail & Related papers (2024-03-04T23:12:17Z) - Observation-Guided Meteorological Field Downscaling at Station Scale: A
Benchmark and a New Method [66.80344502790231]
We extend meteorological downscaling to arbitrary scattered station scales and establish a new benchmark and dataset.
Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors.
Our proposed method outperforms other specially designed baseline models on multiple surface variables.
arXiv Detail & Related papers (2024-01-22T14:02:56Z) - Improving day-ahead Solar Irradiance Time Series Forecasting by
Leveraging Spatio-Temporal Context [46.72071291175356]
Solar power harbors immense potential in mitigating climate change by substantially reducing CO$_2$ emissions.
However, the inherent variability of solar irradiance poses a significant challenge for seamlessly integrating solar power into the electrical grid.
In this paper, we put forth a deep learning architecture designed to harnesstemporal context using satellite data.
arXiv Detail & Related papers (2023-06-01T19:54:39Z) - Solar Active Region Magnetogram Image Dataset for Studies of Space
Weather [0.0]
The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions.
This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares.
This dataset is a minimally processed, user dataset of consistently sized images of solar active regions that can serve as a benchmark dataset for solar flare prediction research.
arXiv Detail & Related papers (2023-05-16T14:44:24Z) - A Comparative Study on Generative Models for High Resolution Solar
Observation Imaging [59.372588316558826]
This work investigates capabilities of current state-of-the-art generative models to accurately capture the data distribution behind observed solar activity states.
Using distributed training on supercomputers, we are able to train generative models for up to 1024x1024 resolution that produce high quality samples indistinguishable to human experts.
arXiv Detail & Related papers (2023-04-14T14:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.