Distribution-free inference for LightGBM and GLM with Tweedie loss
- URL: http://arxiv.org/abs/2507.06921v1
- Date: Wed, 09 Jul 2025 14:58:54 GMT
- Title: Distribution-free inference for LightGBM and GLM with Tweedie loss
- Authors: Alokesh Manna, Aditya Vikram Sett, Dipak K. Dey, Yuwen Gu, Elizabeth D. Schifano, Jichao He,
- Abstract summary: Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty.<n>In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss.
- Score: 2.638878351659023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.
Related papers
- Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction [0.0]
We propose a model-agnostic uncertainty quantification method that integrates dynamic threshold calibration and cross-modal consistency verification.<n>We show that the framework achieves stable performance across varying calibration-to-test split ratios, underscoring its robustness for real-world deployment in healthcare, autonomous systems, and other safety-sensitive domains.<n>This work bridges the gap between theoretical reliability and practical applicability in multi-modal AI systems, offering a scalable solution for hallucination detection and uncertainty-aware decision-making.
arXiv Detail & Related papers (2025-04-24T15:39:46Z) - Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials (MLIPs) are crucial for quantifying model error.<n>We evaluate uncertainty estimations of GPR-based MLIPs, including the predictive GPR standard deviation and ensemble-based uncertainties.
arXiv Detail & Related papers (2024-10-27T10:06:09Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Uncertainty in Language Models: Assessment through Rank-Calibration [65.10149293133846]
Language Models (LMs) have shown promising performance in natural language generation.
It is crucial to correctly quantify their uncertainty in responding to given inputs.
We develop a novel and practical framework, termed $Rank$-$Calibration$, to assess uncertainty and confidence measures for LMs.
arXiv Detail & Related papers (2024-04-04T02:31:05Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron [121.10338065441417]
We analyze a Perceptron-type algorithm called GLM-tron and provide its dimension-free risk upper bounds for high-dimensional ReLU regression.
Our results suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.
arXiv Detail & Related papers (2023-03-03T23:02:23Z) - Beyond Ridge Regression for Distribution-Free Data [8.523307608620094]
The normalized maximum likelihood (pNML) has been proposed as the min-max regret solution for the distribution-free setting, where no distributional assumptions are made on the data.
It has been suggested to use NML with luckiness'': A prior-like function is applied to the hypothesis class, which reduces its effective size.
The associated pNML with luckiness (LpNML) prediction deviates from the ridge regression empirical risk minimizer (Ridge ERM)
Our LpNML reduces the Ridge ERM error by up to 20% for the PMLB sets, and
arXiv Detail & Related papers (2022-06-17T13:16:46Z) - Selective Regression Under Fairness Criteria [30.672082160544996]
In some cases, the performance of minority group can decrease while we reduce the coverage.
We show that such an unwanted behavior can be avoided if we can construct features satisfying the sufficiency criterion.
arXiv Detail & Related papers (2021-10-28T19:05:12Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Autocalibration and Tweedie-dominance for Insurance Pricing with Machine
Learning [0.0]
It is shown that minimizing deviance involves a trade-off between the integral of weighted differences of lower partial moments and the bias measured on a specific scale.
This new method to correct for bias adds extra local GLM step to the analysis.
The convex order appears to be the natural tool to compare competing models.
arXiv Detail & Related papers (2021-03-05T12:40:30Z) - Semi-Supervised Learning with Variational Bayesian Inference and Maximum
Uncertainty Regularization [62.21716612888669]
We propose two generic methods for improving semi-supervised learning (SSL)
The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods.
The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR)
arXiv Detail & Related papers (2020-12-03T09:49:35Z) - Differentially Private Bayesian Inference for Generalized Linear Models [21.942149563564968]
Generalized linear models (GLMs) such as logistic regression are among the most widely used arms in data analyst's repertoire.
We introduce a generic noise-aware DP Bayesian inference method for a GLM, given a noisy sum of summary statistics.
We experimentally demonstrate that the posteriors obtained from our model, while adhering to strong privacy guarantees, are close to the non-private posteriors.
arXiv Detail & Related papers (2020-11-01T10:38:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.