Related papers: Generalization and Robustness of the Tilted Empirical Risk

Generalization and Robustness of the Tilted Empirical Risk

URL: http://arxiv.org/abs/2409.19431v3
Date: Sat, 07 Jun 2025 17:10:54 GMT
Title: Generalization and Robustness of the Tilted Empirical Risk
Authors: Gholamali Aminian, Amir R. Asadi, Tian Li, Ahmad Beirami, Gesine Reinert, Samuel N. Cohen,
Abstract summary: generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data.<n>Inspired by exponential tilting, citetli 2020tilted proposed the it tilted empirical risk (TER) as a non-linear risk metric for machine learning applications.
Score: 17.48212403081267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER) as a non-linear risk metric for machine learning applications such as classification and regression problems. In this work, we examine the generalization error of the tilted empirical risk in the robustness regime under \textit{negative tilt}. Our first contribution is to provide uniform and information-theoretic bounds on the {\it tilted generalization error}, defined as the difference between the population risk and the tilted empirical risk, under negative tilt for unbounded loss function under bounded $(1+\epsilon)$-th moment of loss function for some $\epsilon\in(0,1]$ with a convergence rate of $O(n^{-\epsilon/(1+\epsilon)})$ where $n$ is the number of training samples, revealing a novel application for TER under no distribution shift. Secondly, we study the robustness of the tilted empirical risk with respect to noisy outliers at training time and provide theoretical guarantees under distribution shift for the tilted empirical risk. We empirically corroborate our findings in simple experimental setups where we evaluate our bounds to select the value of tilt in a data-driven manner.

Related papers

Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment. First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population. We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss. To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure. We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z)
Error Bounds of Supervised Classification from Information-Theoretic Perspective [0.0]
We explore bounds on the expected risk when using deep neural networks for supervised classification from an information theoretic perspective. We introduce model risk and fitting error, which are derived from further decomposing the empirical risk.
arXiv Detail & Related papers (2024-06-07T01:07:35Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Non-Asymptotic Bounds for Adversarial Excess Risk under Misspecified Models [9.65010022854885]
We show that adversarial risk is equivalent to the risk induced by a distributional adversarial attack under certain smoothness conditions. To evaluate the generalization performance of the adversarial estimator, we study the adversarial excess risk.
arXiv Detail & Related papers (2023-09-02T00:51:19Z)
Domain Generalization without Excess Empirical Risk [83.26052467843725]
A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty. We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization. We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk.
arXiv Detail & Related papers (2023-08-30T08:46:46Z)
Mean-field Analysis of Generalization Errors [1.1344265020822928]
We consider the KL-regularized empirical risk minimization problem and establish generic conditions under which the generalization error convergence rate, when training on a sample of size $n$, is $mathcalO (1/n)$. In the context of supervised learning with a one-hidden layer neural network in the mean-field regime, these conditions are reflected in suitable integrability and regularity assumptions on the loss and activation functions.
arXiv Detail & Related papers (2023-06-20T15:49:09Z)
A Generalized Unbiased Risk Estimator for Learning with Augmented Classes [70.20752731393938]
Given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees. We propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees.
arXiv Detail & Related papers (2023-06-12T06:52:04Z)
Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors [10.857775300638831]
We explore prediction risk as well as estimation risk under more general regression error assumptions. Our findings suggest that the benefits of over parameterization can extend to time series, panel and grouped data.
arXiv Detail & Related papers (2023-05-22T10:04:20Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
Stability and Risk Bounds of Iterative Hard Thresholding [41.082982732100696]
We introduce a novel sparse generalization theory for IHT under the notion of algorithmic stability. We show that IHT with sparsity level $k$ enjoys an $mathcaltilde O(n-1/2sqrtlog(n)log(p))$ rate of convergence in sparse excess risk. Preliminary numerical evidence is provided to confirm our theoretical predictions.
arXiv Detail & Related papers (2022-03-17T16:12:56Z)
Non-Asymptotic Guarantees for Robust Statistical Learning under $(1+\varepsilon)$-th Moment Assumption [0.716879432974126]
This paper proposes a log-truncated M-mestiator for a large family of statistical regressions. We show the superiority of log-truncated estimations over standard estimations.
arXiv Detail & Related papers (2022-01-10T06:22:30Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality. We prove that quantile regression suffers from an inherent under-coverage bias. Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds. We prove that our bound is valid for 0-1 empirical risk minimization. This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z)
Learning with Gradient Descent and Weakly Convex Losses [14.145079120746614]
We study the learning performance of gradient descent when the empirical risk is weakly convex. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity.
arXiv Detail & Related papers (2021-01-13T09:58:06Z)
Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss. We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z)
Understanding and Mitigating the Tradeoff Between Robustness and Accuracy [88.51943635427709]
Adversarial training augments the training set with perturbations to improve the robust error. We show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor.
arXiv Detail & Related papers (2020-02-25T08:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.