How to Train Your Flare Prediction Model: Revisiting Robust Sampling of
Rare Events
- URL: http://arxiv.org/abs/2103.07542v1
- Date: Fri, 12 Mar 2021 21:37:08 GMT
- Title: How to Train Your Flare Prediction Model: Revisiting Robust Sampling of
Rare Events
- Authors: Azim Ahmadzadeh, Berkay Aydin, Manolis K. Georgoulis, Dustin J.
Kempton, Sushant S. Mahajan, and Rafal A. Angryk
- Abstract summary: We present a case study of solar flare forecasting by means of metadata feature time series, by treating it as a prominent class-imbalance and temporally coherent problem.
We showcase the general concept of temporal coherence triggered by the demand of continuity in time series forecasting and show that lack of proper understanding of this effect may spuriously enhance models' performance.
We revisit the main remedies for these challenges and present several experiments to illustrate the exact impact that each of these remedies may have on performance.
- Score: 0.9851812512860351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a case study of solar flare forecasting by means of metadata
feature time series, by treating it as a prominent class-imbalance and
temporally coherent problem. Taking full advantage of pre-flare time series in
solar active regions is made possible via the Space Weather Analytics for Solar
Flares (SWAN-SF) benchmark dataset; a partitioned collection of multivariate
time series of active region properties comprising 4075 regions and spanning
over 9 years of the Solar Dynamics Observatory (SDO) period of operations. We
showcase the general concept of temporal coherence triggered by the demand of
continuity in time series forecasting and show that lack of proper
understanding of this effect may spuriously enhance models' performance. We
further address another well-known challenge in rare event prediction, namely,
the class-imbalance issue. The SWAN-SF is an appropriate dataset for this, with
a 60:1 imbalance ratio for GOES M- and X-class flares and a 800:1 for X-class
flares against flare-quiet instances. We revisit the main remedies for these
challenges and present several experiments to illustrate the exact impact that
each of these remedies may have on performance. Moreover, we acknowledge that
some basic data manipulation tasks such as data normalization and cross
validation may also impact the performance -- we discuss these problems as
well. In this framework we also review the primary advantages and disadvantages
of using true skill statistic and Heidke skill score, as two widely used
performance verification metrics for the flare forecasting task. In conclusion,
we show and advocate for the benefits of time series vs. point-in-time
forecasting, provided that the above challenges are measurably and
quantitatively addressed.
Related papers
- Towards Hybrid Embedded Feature Selection and Classification Approach with Slim-TSF [0.0]
This study aims to uncover hidden relationships and the evolutionary characteristics of solar flares and their source regions.
Preliminary findings indicate a notable improvement, with an average increase of 5% in both the True Skill Statistic (TSS) and Heidke Skill Score (HSS)
arXiv Detail & Related papers (2024-09-06T18:12:05Z) - Generating Fine-Grained Causality in Climate Time Series Data for Forecasting and Anomaly Detection [67.40407388422514]
We design a conceptual fine-grained causal model named TBN Granger Causality.
Second, we propose an end-to-end deep generative model called TacSas, which discovers TBN Granger Causality in a generative manner.
We test TacSas on climate benchmark ERA5 for climate forecasting and the extreme weather benchmark of NOAA for extreme weather alerts.
arXiv Detail & Related papers (2024-08-08T06:47:21Z) - Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting [65.40983982856056]
We introduce STOIC, that leverages correlations between time-series to learn underlying structure between time-series and to provide well-calibrated and accurate forecasts.
Over a wide-range of benchmark datasets STOIC provides 16% more accurate and better-calibrated forecasts.
arXiv Detail & Related papers (2024-07-02T20:14:32Z) - Enhancing reliability in prediction intervals using point forecasters: Heteroscedastic Quantile Regression and Width-Adaptive Conformal Inference [0.0]
We argue that, when evaluating a set of intervals, traditional measures alone are insufficient.
The intervals must vary in length, with this variation directly linked to the difficulty of the prediction.
We propose the Heteroscedastic Quantile Regression (HQR) model and the Width-Adaptive Conformal Inference ( WACI) method.
arXiv Detail & Related papers (2024-06-21T06:51:13Z) - How far are today's time-series models from real-world weather forecasting applications? [22.68937280154092]
WEATHER-5K is a comprehensive collection of observational weather data that better reflects real-world scenarios.
It enables a better training of models and a more accurate assessment of the real-world forecasting capabilities of TSF models.
We provide researchers with a clear assessment of the gap between academic TSF models and real-world weather forecasting applications.
arXiv Detail & Related papers (2024-06-20T15:18:52Z) - Active Region-based Flare Forecasting with Sliding Window Multivariate
Time Series Forest Classifiers [0.0]
We bridge the gap between complex, less understandable black-box models used for high-dimensional data and the exploration of relevant sub-intervals.
Our findings demonstrate that our sliding-window time series forest classifier performs effectively in solar flare prediction.
arXiv Detail & Related papers (2024-02-05T19:34:12Z) - Attention-Based Ensemble Pooling for Time Series Forecasting [55.2480439325792]
We propose a method for pooling that performs a weighted average over candidate model forecasts.
We test this method on two time-series forecasting problems: multi-step forecasting of the dynamics of the non-stationary Lorenz 63 equation, and one-step forecasting of the weekly incident deaths due to COVID-19.
arXiv Detail & Related papers (2023-10-24T22:59:56Z) - Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.
We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.
We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Understanding the Impact of Competing Events on Heterogeneous Treatment
Effect Estimation from Time-to-Event Data [92.51773744318119]
We study the problem of inferring heterogeneous treatment effects (HTEs) from time-to-event data in the presence of competing events.
We take an outcome modeling approach to estimating HTEs, and consider how and when existing prediction models for time-to-event data can be used as plug-in estimators for potential outcomes.
We theoretically analyze and empirically illustrate when and how these challenges play a role when using generic machine learning prediction models for the estimation of HTEs.
arXiv Detail & Related papers (2023-02-23T14:28:55Z) - Improving Solar Flare Prediction by Time Series Outlier Detection [1.0131895986034316]
outliers on the reliability and those models' performance.
We employ Isolation Forest to detect the outliers among the weaker flare instances.
We achieve a 279% increase in True Skill Statistic and 68% increase in Heidke Skill Score.
arXiv Detail & Related papers (2022-06-14T22:54:39Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.