Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy
Implications
- URL: http://arxiv.org/abs/2311.05054v1
- Date: Wed, 8 Nov 2023 23:33:39 GMT
- Title: Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy
Implications
- Authors: Jiashuo Liu, Jiayun Wu, Tianyu Wang, Hao Zou, Bo Li, Peng Cui
- Abstract summary: Machine learning algorithms minimize average risk are susceptible to distributional shifts.
distributionally robust optimization (DRO) addresses this issue by optimizing the worst-case risk within an uncertainty set.
DRO suffers from over-pessimism, leading to low-confidence predictions, poor parameter estimations as well as poor generalization.
In this work, we conduct a theoretical analysis of a probable root cause of over-pessimism: excessive focus on noisy samples.
- Score: 31.3535638804615
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Machine learning algorithms minimizing average risk are susceptible to
distributional shifts. Distributionally Robust Optimization (DRO) addresses
this issue by optimizing the worst-case risk within an uncertainty set.
However, DRO suffers from over-pessimism, leading to low-confidence
predictions, poor parameter estimations as well as poor generalization. In this
work, we conduct a theoretical analysis of a probable root cause of
over-pessimism: excessive focus on noisy samples. To alleviate the impact of
noise, we incorporate data geometry into calibration terms in DRO, resulting in
our novel Geometry-Calibrated DRO (GCDRO) for regression. We establish the
connection between our risk objective and the Helmholtz free energy in
statistical physics, and this free-energy-based risk can extend to standard DRO
methods. Leveraging gradient flow in Wasserstein space, we develop an
approximate minimax optimization algorithm with a bounded error ratio and
elucidate how our approach mitigates noisy sample effects. Comprehensive
experiments confirm GCDRO's superiority over conventional DRO methods.
Related papers
- A Primal-Dual Algorithm for Faster Distributionally Robust Optimization [12.311794669976047]
We present Drago, a primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems.
We support our theoretical results with numerical benchmarks in classification and regression.
arXiv Detail & Related papers (2024-03-16T02:06:14Z) - Outlier-Robust Wasserstein DRO [19.355450629316486]
Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty.
We propose a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (TV) contamination.
We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem.
arXiv Detail & Related papers (2023-11-09T18:32:00Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Efficient Stochastic Approximation of Minimax Excess Risk Optimization [36.68685001551774]
We develop efficient approximation approaches which directly target MERO.
We demonstrate that the bias, caused by the estimation error of the minimal risk, is under-control.
We also investigate a practical scenario where the quantity of samples drawn from each distribution may differ, and propose an approach that delivers distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-05-31T02:21:11Z) - Robust Generalization against Photon-Limited Corruptions via Worst-Case
Sharpness Minimization [89.92932924515324]
Robust generalization aims to tackle the most challenging data distributions which are rare in the training set and contain severe noises.
Common solutions such as distributionally robust optimization (DRO) focus on the worst-case empirical risk to ensure low training error.
We propose SharpDRO by penalizing the sharpness of the worst-case distribution, which measures the loss changes around the neighbor of learning parameters.
We show that SharpDRO exhibits a strong generalization ability against severe corruptions and exceeds well-known baseline methods with large performance gains.
arXiv Detail & Related papers (2023-03-23T07:58:48Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - DORO: Distributional and Outlier Robust Optimization [98.44757325531631]
We propose the framework of DORO, for Distributional and Outlier Robust Optimization.
At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers.
We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets.
arXiv Detail & Related papers (2021-06-11T02:59:54Z) - Principled learning method for Wasserstein distributionally robust
optimization with local perturbations [21.611525306059985]
Wasserstein distributionally robust optimization (WDRO) attempts to learn a model that minimizes the local worst-case risk in the vicinity of the empirical data distribution.
We propose a minimizer based on a novel approximation theorem and provide the corresponding risk consistency results.
Our results show that the proposed method achieves significantly higher accuracy than baseline models on noisy datasets.
arXiv Detail & Related papers (2020-06-05T09:32:37Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z) - Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations.
GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.