Predictability Analysis of Regression Problems via Conditional Entropy Estimations
- URL: http://arxiv.org/abs/2406.03824v1
- Date: Thu, 6 Jun 2024 07:59:19 GMT
- Title: Predictability Analysis of Regression Problems via Conditional Entropy Estimations
- Authors: Yu-Hsueh Fang, Chia-Yen Lee,
- Abstract summary: Conditional entropy estimators are developed to assess predictability in regression problems.
Experiments on synthesized and real-world datasets demonstrate the robustness and utility of these estimators.
- Score: 1.8913544072080544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of machine learning, regression problems are pivotal due to their ability to predict continuous outcomes. Traditional error metrics like mean squared error, mean absolute error, and coefficient of determination measure model accuracy. The model accuracy is the consequence of the selected model and the features, which blurs the analysis of contribution. Predictability, in the other hand, focus on the predictable level of a target variable given a set of features. This study introduces conditional entropy estimators to assess predictability in regression problems, bridging this gap. We enhance and develop reliable conditional entropy estimators, particularly the KNIFE-P estimator and LMC-P estimator, which offer under- and over-estimation, providing a practical framework for predictability analysis. Extensive experiments on synthesized and real-world datasets demonstrate the robustness and utility of these estimators. Additionally, we extend the analysis to the coefficient of determination \(R^2 \), enhancing the interpretability of predictability. The results highlight the effectiveness of KNIFE-P and LMC-P in capturing the achievable performance and limitations of feature sets, providing valuable tools in the development of regression models. These indicators offer a robust framework for assessing the predictability for regression problems.
Related papers
- Average Causal Effect Estimation in DAGs with Hidden Variables: Extensions of Back-Door and Front-Door Criteria [3.0232957374216953]
We develop one-step corrected plug-in and targeted minimum loss-based estimators of causal effects for a class of directed aparametric graphs (DAGs) with hidden variables.
We leverage machine learning to minimize modeling assumptions while ensuring key statistical properties such as linear primality, double robustness, efficiency, and staying within the bounds of the target parameter space.
arXiv Detail & Related papers (2024-09-06T01:07:29Z) - Errors-in-variables Fr\'echet Regression with Low-rank Covariate
Approximation [2.1756081703276]
Fr'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables.
Our proposed framework combines the concepts of global Fr'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator.
arXiv Detail & Related papers (2023-05-16T08:37:54Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Forecast Evaluation in Large Cross-Sections of Realized Volatility [0.0]
We evaluate the predictive accuracy of the model based on the augmented cross-section when forecasting Realized volatility.
We study the sensitivity of forecasts to the model specification by incorporating a measurement error correction as well as cross-sectional jump component measures.
arXiv Detail & Related papers (2021-12-09T13:19:09Z) - Causality and Generalizability: Identifiability and Learning Methods [0.0]
This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust prediction methods.
We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization.
We propose a general framework for distributional robustness with respect to intervention-induced distributions.
arXiv Detail & Related papers (2021-10-04T13:12:11Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors.
We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem.
For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.