Empirical Risk Minimization for Losses without Variance
- URL: http://arxiv.org/abs/2309.03818v1
- Date: Thu, 7 Sep 2023 16:14:00 GMT
- Title: Empirical Risk Minimization for Losses without Variance
- Authors: Guanhua Fang, Ping Li, Gennady Samorodnitsky
- Abstract summary: This paper considers an empirical risk problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p in (1,2)$.
Instead of using estimation procedure based on truncated observed data, we choose the minimization by minimizing the risk value.
Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012).
- Score: 26.30435936379624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers an empirical risk minimization problem under
heavy-tailed settings, where data does not have finite variance, but only has
$p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based
on truncated observed data, we choose the optimizer by minimizing the risk
value. Those risk values can be robustly estimated via using the remarkable
Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type
influence functions, we are able to establish excess risk upper bounds via
using generalized generic chaining methods. Moreover, we take computational
issues into consideration. We especially theoretically investigate two types of
optimization methods, robust gradient descent algorithm and empirical
risk-based methods. With an extensive numerical study, we find that the
optimizer based on empirical risks via Catoni-style estimation indeed shows
better performance than other baselines. It indicates that estimation directly
based on truncated data may lead to unsatisfactory results.
Related papers
- Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss.
To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure.
We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Optimizing the Performative Risk under Weak Convexity Assumptions [0.0]
In performative prediction, a predictive model impacts the distribution that generates future data.
Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity the performative risk.
In this paper, we relax these assumptions, without sacrificing the amenability of the performative minimization risk problem for iterative optimization methods.
arXiv Detail & Related papers (2022-09-02T01:07:09Z) - Risk-Aware Linear Bandits: Theory and Applications in Smart Order
Routing [10.69955834942979]
We consider risk-aware bandits optimization with applications in smart order routing (SOR)
Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm.
arXiv Detail & Related papers (2022-08-04T00:21:10Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Dimensionality reduction, regularization, and generalization in
overparameterized regressions [8.615625517708324]
We show that PCA-OLS, also known as principal component regression, can be avoided with a dimensionality reduction.
We show that dimensionality reduction improves robustness while OLS is arbitrarily susceptible to adversarial attacks.
We find that methods in which the projection depends on the training data can outperform methods where the projections are chosen independently of the training data.
arXiv Detail & Related papers (2020-11-23T15:38:50Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Asymptotic normality of robust risk minimizers [2.0432586732993374]
This paper investigates properties of algorithms that can be viewed as robust analogues of the classical empirical risk.
We show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of risk converge to the minimizers of the true risk at the same rate.
arXiv Detail & Related papers (2020-04-05T22:03:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.