Applying ranking techniques for estimating influence of Earth variables
on temperature forecast error
- URL: http://arxiv.org/abs/2403.07966v1
- Date: Tue, 12 Mar 2024 12:59:00 GMT
- Title: Applying ranking techniques for estimating influence of Earth variables
on temperature forecast error
- Authors: M. Julia Flores, Melissa Ruiz-V\'asquez, Ana Bastos, Ren\'e Orth
- Abstract summary: This paper describes how to analyze the influence of Earth system variables on the errors when providing temperature forecasts.
Main contribution is the framework that shows how to convert correlations into rankings and combine them into an aggregate ranking.
We have carried out experiments on five chosen locations to analyze the behavior of this ranking-based methodology.
- Score: 0.6144680854063939
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper describes how to analyze the influence of Earth system variables
on the errors when providing temperature forecasts. The initial framework to
get the data has been based on previous research work, which resulted in a very
interesting discovery. However, the aforementioned study only worked on
individual correlations of the variables with respect to the error. This
research work is going to re-use the main ideas but introduce three main
novelties: (1) applying a data science approach by a few representative
locations; (2) taking advantage of the rankings created by Spearman correlation
but enriching them with other metrics looking for a more robust ranking of the
variables; (3) evaluation of the methodology by learning random forest models
for regression with the distinct experimental variations. The main contribution
is the framework that shows how to convert correlations into rankings and
combine them into an aggregate ranking. We have carried out experiments on five
chosen locations to analyze the behavior of this ranking-based methodology. The
results show that the specific performance is dependent on the location and
season, which is expected, and that this selection technique works properly
with Random Forest models but can also improve simpler regression models such
as Bayesian Ridge. This work also contributes with an extensive analysis of the
results. We can conclude that this selection based on the top-k ranked
variables seems promising for this real problem, and it could also be applied
in other domains.
Related papers
- Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system.
Scientists typically collect low-level measurements, such as geographically distributed temperature readings.
We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z) - A Sparsity Principle for Partially Observable Causal Representation Learning [28.25303444099773]
Causal representation learning aims at identifying high-level causal variables from perceptual data.
We focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern.
We propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation.
arXiv Detail & Related papers (2024-03-13T08:40:49Z) - A Notion of Feature Importance by Decorrelation and Detection of Trends
by Random Forest Regression [1.675857332621569]
We introduce a novel notion of feature importance based on the well-studied Gram-Schmidt decorrelation method.
We propose two estimators for identifying trends in the data using random forest regression.
arXiv Detail & Related papers (2023-03-02T11:01:49Z) - Model Optimization in Imbalanced Regression [2.580765958706854]
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain.
One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values.
Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA)
This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain.
arXiv Detail & Related papers (2022-06-20T20:23:56Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Achieving Reliable Causal Inference with Data-Mined Variables: A Random
Forest Approach to the Measurement Error Problem [1.5749416770494704]
A common empirical strategy involves the application of predictive modeling techniques to'mine' variables of interest from available data.
Recent work highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables are likely to suffer from bias due to measurement error.
We propose a novel approach to mitigate these biases, leveraging the ensemble learning technique known as the random forest.
arXiv Detail & Related papers (2020-12-19T21:48:23Z) - A Feature Importance Analysis for Soft-Sensing-Based Predictions in a
Chemical Sulphonation Process [0.0]
We use a soft-sensing approach, that is, predicting a variable of interest based on other process variables, instead of directly sensing the variable of interest.
The aim of this study was to explore and detect which variables are the most relevant for predicting product quality, and to what degree of precision.
arXiv Detail & Related papers (2020-09-25T11:20:06Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.