Analysis of Kernel Mirror Prox for Measure Optimization
- URL: http://arxiv.org/abs/2403.00147v1
- Date: Thu, 29 Feb 2024 21:55:17 GMT
- Title: Analysis of Kernel Mirror Prox for Measure Optimization
- Authors: Pavel Dvurechensky and Jia-Jie Zhu
- Abstract summary: We study in a unified framework a class of functional saddle-point optimization problems, which we term the Mixed Functional Nash Equilibrium (MFNE)
We model the saddle-point optimization dynamics as an interacting Fisher-Rao-RKHS gradient flow.
We provide a unified convergence analysis of KMP in an infinite-dimensional setting for this class of MFNE problems.
- Score: 4.6080589718744305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By choosing a suitable function space as the dual to the non-negative measure
cone, we study in a unified framework a class of functional saddle-point
optimization problems, which we term the Mixed Functional Nash Equilibrium
(MFNE), that underlies several existing machine learning algorithms, such as
implicit generative models, distributionally robust optimization (DRO), and
Wasserstein barycenters. We model the saddle-point optimization dynamics as an
interacting Fisher-Rao-RKHS gradient flow when the function space is chosen as
a reproducing kernel Hilbert space (RKHS). As a discrete time counterpart, we
propose a primal-dual kernel mirror prox (KMP) algorithm, which uses a dual
step in the RKHS, and a primal entropic mirror prox step. We then provide a
unified convergence analysis of KMP in an infinite-dimensional setting for this
class of MFNE problems, which establishes a convergence rate of $O(1/N)$ in the
deterministic case and $O(1/\sqrt{N})$ in the stochastic case, where $N$ is the
iteration counter. As a case study, we apply our analysis to DRO, providing
algorithmic guarantees for DRO robustness and convergence.
Related papers
- A Primal-Dual Algorithm for Faster Distributionally Robust Optimization [12.311794669976047]
We present Drago, a primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems.
We support our theoretical results with numerical benchmarks in classification and regression.
arXiv Detail & Related papers (2024-03-16T02:06:14Z) - ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm, dubbed ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
We present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate algorithms for the considered class of cFCCO problems.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z) - Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order
Coordinate Method [3.860616339202303]
This paper investigates the distributed non optimization problem of minimizing a global cost function formed by the summation of $ZOn$ local cost functions.
Agents approximate their own ZO coordinate method to solve the problem.
arXiv Detail & Related papers (2021-03-24T03:07:46Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z) - Kernel Distributionally Robust Optimization [17.909696462645023]
We propose kernel distributionally robust optimization ( Kernel DRO) using insights from the robust optimization theory and functional analysis.
Our method uses kernel kernel (RKHS) to construct a wide range of convex ambiguity, which can be generalized to sets based on probability metrics and finite-order moment sets.
Using universal RKHSs, the theorem applies to a broad class of loss functions, lifting common limitations such as losses and knowledge of the Lipschitz constant.
arXiv Detail & Related papers (2020-06-12T07:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.