Related papers: Robust variance-regularized risk minimization with concomitant scaling

Robust variance-regularized risk minimization with concomitant scaling

URL: http://arxiv.org/abs/2301.11584v2
Date: Fri, 9 Feb 2024 02:59:33 GMT
Title: Robust variance-regularized risk minimization with concomitant scaling
Authors: Matthew J. Holland
Abstract summary: Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation without trying to accurately estimate the variance. By modifying a technique for variance-free robust mean estimation, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in traditional machine learning.
Score: 8.666172545138275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation, without trying to accurately estimate the variance. By modifying a technique for variance-free robust mean estimation to fit our problem setting, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in traditional machine learning workflows. Empirically, we verify that our proposed approach, despite its simplicity, performs as well or better than even the best-performing candidates derived from alternative criteria such as CVaR or DRO risks on a variety of datasets.

Related papers

Making Robust Generalizers Less Rigid with Soft Ascent-Descent [6.527016551650139]
Methods applying sharpness-aware minimization have proven successful for image classification tasks, but only using deep neural networks in a scenario where the most difficult points are also the least common. In this work, we show how such a strategy can dramatically break down under more diverse models, and as a more robust alternative, instead of typical sharpness we propose and evaluate a training criterion which penalizes poor loss concentration.
arXiv Detail & Related papers (2024-08-07T08:23:42Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Empirical Risk Minimization for Losses without Variance [26.30435936379624]
This paper considers an empirical risk problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the minimization by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012).
arXiv Detail & Related papers (2023-09-07T16:14:00Z)
On the Variance, Admissibility, and Stability of Empirical Risk Minimization [80.26309576810844]
Empirical Risk Minimization (ERM) with squared loss may attain minimax suboptimal error rates. We show that under mild assumptions, the suboptimality of ERM must be due to large bias rather than variance. We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes.
arXiv Detail & Related papers (2023-05-29T15:25:48Z)
A Model-Based Method for Minimizing CVaR and Beyond [7.751691910877239]
We develop a variant of the prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. CVaR is a risk measure focused on minimizing worst-case performance, defined as the average of the top quantile of the losses. In machine learning, such a risk measure is useful to train more robust models.
arXiv Detail & Related papers (2023-05-27T15:38:53Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
Tilted Empirical Risk Minimization [26.87656095874882]
We show that it is possible to flexibly tune the impact of individual losses through a straightforward extension to empirical risk minimization. We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness. It can also enable entirely new applications, such as simultaneously addressing outliers and promoting fairness.
arXiv Detail & Related papers (2020-07-02T14:49:48Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
A Graduated Filter Method for Large Scale Robust Estimation [32.08441889054456]
We introduce a novel solver for robust estimation that possesses a strong ability to escape poor local minima. Our algorithm is built upon the graduated-of-the-art methods to solve problems having many poor local minima.
arXiv Detail & Related papers (2020-03-20T02:51:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.