High-dimensional prediction for count response via sparse exponential weights
- URL: http://arxiv.org/abs/2410.15381v1
- Date: Sun, 20 Oct 2024 12:45:42 GMT
- Title: High-dimensional prediction for count response via sparse exponential weights
- Authors: The Tien Mai,
- Abstract summary: This paper introduces a novel probabilistic machine learning framework for high-dimensional count data prediction.
A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC-Bayesian bounds.
Our results include non-asymptotic oracle inequalities, demonstrating rate-optimal prediction error without prior knowledge of sparsity.
- Score: 0.0
- License:
- Abstract: Count data is prevalent in various fields like ecology, medical research, and genomics. In high-dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high-dimensional count data, Bayesian approaches remain under-explored with no theoretical results on prediction performance. This paper introduces a novel probabilistic machine learning framework for high-dimensional count data prediction. We propose a pseudo-Bayesian method that integrates a scaled Student prior to promote sparsity and uses an exponential weight aggregation procedure. A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC-Bayesian bounds. Our results include non-asymptotic oracle inequalities, demonstrating rate-optimal prediction error without prior knowledge of sparsity. We implement this approach efficiently using Langevin Monte Carlo method. Simulations and a real data application highlight the strong performance of our method compared to the Lasso in various settings.
Related papers
- A sparse PAC-Bayesian approach for high-dimensional quantile prediction [0.0]
This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction.
It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation.
Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.
arXiv Detail & Related papers (2024-09-03T08:01:01Z) - Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL)
It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights.
The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z) - Joint Prediction Regions for time-series models [0.0]
It is an easy task to compute Joint Prediction regions (JPR) when the data is IID.
This project aims to implement Wolf and Wunderli's method for constructing JPRs and compare it with other methods.
arXiv Detail & Related papers (2024-05-14T02:38:49Z) - UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation [9.387706860375461]
We introduce a novel strategy called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA)
This technique excels in efficiently aggregating multiple prediction intervals while maintaining a small average width of the prediction band and ensuring coverage.
It is validated through its application to synthetic data and two real-world datasets in finance and macroeconomics.
arXiv Detail & Related papers (2023-06-28T20:38:37Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models.
We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - A Robust Functional EM Algorithm for Incomplete Panel Count Data [66.07942227228014]
We propose a functional EM algorithm to estimate the counting process mean function under a missing completely at random assumption (MCAR)
The proposed algorithm wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption.
We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.
arXiv Detail & Related papers (2020-03-02T20:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.