Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization
- URL: http://arxiv.org/abs/2104.13326v1
- Date: Tue, 27 Apr 2021 16:56:09 GMT
- Title: Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization
- Authors: Yaodong Yu, Tianyi Lin, Eric Mazumdar, Michael I. Jordan
- Abstract summary: Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
- Score: 85.84019017587477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributionally robust supervised learning (DRSL) is emerging as a key
paradigm for building reliable machine learning systems for real-world
applications -- reflecting the need for classifiers and predictive models that
are robust to the distribution shifts that arise from phenomena such as
selection bias or nonstationarity. Existing algorithms for solving Wasserstein
DRSL -- one of the most popular DRSL frameworks based around robustness to
perturbations in the Wasserstein distance -- involve solving complex
subproblems or fail to make use of stochastic gradients, limiting their use in
large-scale machine learning problems. We revisit Wasserstein DRSL through the
lens of min-max optimization and derive scalable and efficiently implementable
stochastic extra-gradient algorithms which provably achieve faster convergence
rates than existing approaches. We demonstrate their effectiveness on synthetic
and real data when compared to existing DRSL approaches. Key to our results is
the use of variance reduction and random reshuffling to accelerate stochastic
min-max optimization, the analysis of which may be of independent interest.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Single-Trajectory Distributionally Robust Reinforcement Learning [21.955807398493334]
We propose Distributionally Robust RL (DRRL) to enhance performance across a range of environments.
Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory.
We design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ)
arXiv Detail & Related papers (2023-01-27T14:08:09Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Sinkhorn Distributionally Robust Optimization [15.194516549163245]
We derive convex programming dual reformulation for general nominal distributions, transport costs, and loss functions.
Compared with Wasserstein DRO, our proposed approach offers enhanced computational tractability for a broader class of loss functions.
arXiv Detail & Related papers (2021-09-24T12:40:48Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.