On the Generalization and Robustness in Conditional Value-at-Risk
- URL: http://arxiv.org/abs/2602.18053v1
- Date: Fri, 20 Feb 2026 08:10:11 GMT
- Title: On the Generalization and Robustness in Conditional Value-at-Risk
- Authors: Dinesh Karthik Mulumudi, Piyushi Manupriya, Gholamali Aminian, Anant Raj,
- Abstract summary: We develop a learning-theoretic analysis of Conditional Value-at-Risk (CVaR)-based empirical risk minimization under heavy-tailed and contaminated data.<n>We establish sharp, high-probability generalization and excess risk bounds under minimal moment assumptions.<n>We show that CVaR decisions themselves can be intrinsically unstable under heavy tails.
- Score: 12.253712889424584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conditional Value-at-Risk (CVaR) is a widely used risk-sensitive objective for learning under rare but high-impact losses, yet its statistical behavior under heavy-tailed data remains poorly understood. Unlike expectation-based risk, CVaR depends on an endogenous, data-dependent quantile, which couples tail averaging with threshold estimation and fundamentally alters both generalization and robustness properties. In this work, we develop a learning-theoretic analysis of CVaR-based empirical risk minimization under heavy-tailed and contaminated data. We establish sharp, high-probability generalization and excess risk bounds under minimal moment assumptions, covering fixed hypotheses, finite and infinite classes, and extending to $β$-mixing dependent data; we further show that these rates are minimax optimal. To capture the intrinsic quantile sensitivity of CVaR, we derive a uniform Bahadur-Kiefer type expansion that isolates a threshold-driven error term absent in mean-risk ERM and essential in heavy-tailed regimes. We complement these results with robustness guarantees by proposing a truncated median-of-means CVaR estimator that achieves optimal rates under adversarial contamination. Finally, we show that CVaR decisions themselves can be intrinsically unstable under heavy tails, establishing a fundamental limitation on decision robustness even when the population optimum is well separated. Together, our results provide a principled characterization of when CVaR learning generalizes and is robust, and when instability is unavoidable due to tail scarcity.
Related papers
- A Researcher's Guide to Empirical Risk Minimization [3.891921282474929]
This guide provides a reference for high-probability regret bounds in empirical risk minimization.<n>We begin with intuition and general proof strategies, then state broadly applicable guarantees under high-level conditions.
arXiv Detail & Related papers (2026-02-25T02:26:23Z) - RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata [0.0]
We introduce a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model.<n>An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers.<n>For continuous sensitive attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone.
arXiv Detail & Related papers (2026-02-09T22:03:11Z) - Statistical Robustness of Interval CVaR Based Regression Models under Perturbation and Contamination [1.578201299411112]
We address the robust nonlinear regression based on the so-called interval conditional value-at-risk (In-CVaR)<n>We rigorously quantify robustness under contamination, with a unified study of distributional breakdown point for a broad class of regression models.<n>We show that the In-CVaR based estimator is qualitatively robust in terms of the Prokhorov metric if and only if the largest portion of losses is trimmed.
arXiv Detail & Related papers (2026-01-16T16:41:57Z) - On Design of Representative Distributionally Robust Formulations for Evaluation of Tail Risk Measures [0.0]
Conditional Value-at-Risk (CVaR) is a risk measure widely used to quantify the impact of extreme losses.<n>In order to combat this sensitivity, Distributionally Robust Optimization (DRO) evaluates the worst-case CVaR measure over a set of plausible data distributions.<n>This paper aims at leveraging extreme value theory to arrive at a DRO formulation which leads to representative worst-case CVaR evaluations.
arXiv Detail & Related papers (2025-06-19T11:40:02Z) - Geometric Median Matching for Robust k-Subset Selection from Noisy Data [75.86423267723728]
We propose a novel k-subset selection strategy that leverages Geometric Median -- a robust estimator with an optimal breakdown point of 1/2.<n>Our method iteratively selects a k-subset such that the mean of the subset approximates the GM of the (potentially) noisy dataset, ensuring robustness even under arbitrary corruption.
arXiv Detail & Related papers (2025-04-01T09:22:05Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Bias-Corrected Peaks-Over-Threshold Estimation of the CVaR [2.552459629685159]
Conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc.
When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well.
To mitigate this problem, the CVaR can be estimated by extrapolating above a lower threshold than the VaR.
arXiv Detail & Related papers (2021-03-08T20:29:06Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.