Efficient Algorithms for Empirical Group Distributional Robust
Optimization and Beyond
- URL: http://arxiv.org/abs/2403.03562v1
- Date: Wed, 6 Mar 2024 09:14:24 GMT
- Title: Efficient Algorithms for Empirical Group Distributional Robust
Optimization and Beyond
- Authors: Dingzhi Yu, Yunuo Cai, Wei Jiang, Lijun Zhang
- Abstract summary: We formulate empirical GDRO as a $textittwo-level$ finite-sum convex-concave minimax optimization problem.
We compute the snapshot and mirror snapshot point by a one-index-shifted weighted average, which distinguishes us from the naive ergodic average.
Remarkably, our approach outperforms the state-of-the-art method by a factor of $sqrtm$.
- Score: 15.664414751701718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the empirical counterpart of group distributionally robust
optimization (GDRO), which aims to minimize the maximal empirical risk across
$m$ distinct groups. We formulate empirical GDRO as a $\textit{two-level}$
finite-sum convex-concave minimax optimization problem and develop a stochastic
variance reduced mirror prox algorithm. Unlike existing methods, we construct
the stochastic gradient by per-group sampling technique and perform variance
reduction for all groups, which fully exploits the $\textit{two-level}$
finite-sum structure of empirical GDRO. Furthermore, we compute the snapshot
and mirror snapshot point by a one-index-shifted weighted average, which
distinguishes us from the naive ergodic average. Our algorithm also supports
non-constant learning rates, which is different from existing literature. We
establish convergence guarantees both in expectation and with high probability,
demonstrating a complexity of
$\mathcal{O}\left(\frac{m\sqrt{\bar{n}\ln{m}}}{\varepsilon}\right)$, where
$\bar n$ is the average number of samples among $m$ groups. Remarkably, our
approach outperforms the state-of-the-art method by a factor of $\sqrt{m}$.
Furthermore, we extend our methodology to deal with the empirical minimax
excess risk optimization (MERO) problem and manage to give the expectation
bound and the high probability bound, accordingly. The complexity of our
empirical MERO algorithm matches that of empirical GDRO at
$\mathcal{O}\left(\frac{m\sqrt{\bar{n}\ln{m}}}{\varepsilon}\right)$,
significantly surpassing the bounds of existing methods.
Related papers
- Achieving the Asymptotically Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach [36.88301225561535]
offline reinforcement learning aims to learn from pre-collected datasets without active exploration.
Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively.
We show that the distributionally robust optimization (DRO) based approach can also address these challenges and is asymptotically minimax optimal
arXiv Detail & Related papers (2023-05-22T17:50:18Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers.
Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z) - Towards Tight Bounds on the Sample Complexity of Average-reward MDPs [39.01663172393174]
We find an optimal policy of an infinite-horizon average-reward Markov decision process given access to a generative model.
We provide an algorithm that solves the problem using $widetildeO(t_mathrmmix epsilon-3)$ (oblivious) samples per state-action pair.
arXiv Detail & Related papers (2021-06-13T17:18:11Z) - An Online Riemannian PCA for Stochastic Canonical Correlation Analysis [37.8212762083567]
We present an efficient algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices.
While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising.
arXiv Detail & Related papers (2021-06-08T23:38:29Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.