Debiasing Conditional Stochastic Optimization
- URL: http://arxiv.org/abs/2304.10613v3
- Date: Sun, 3 Dec 2023 16:22:02 GMT
- Title: Debiasing Conditional Stochastic Optimization
- Authors: Lie He and Shiva Prasad Kasiviswanathan
- Abstract summary: We study the conditional causal optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, etc.
We develop new algorithms for the finite variant variant CSO problem that significantly improve upon existing results.
We believe that our technique has the potential to be a useful tool for addressing similar challenges in other optimization problems.
- Score: 15.901623717313493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the conditional stochastic optimization (CSO) problem
which covers a variety of applications including portfolio selection,
reinforcement learning, robust learning, causal inference, etc. The
sample-averaged gradient of the CSO objective is biased due to its nested
structure, and therefore requires a high sample complexity for convergence. We
introduce a general stochastic extrapolation technique that effectively reduces
the bias. We show that for nonconvex smooth objectives, combining this
extrapolation with variance reduction techniques can achieve a significantly
better sample complexity than the existing bounds. Additionally, we develop new
algorithms for the finite-sum variant of the CSO problem that also
significantly improve upon existing results. Finally, we believe that our
debiasing technique has the potential to be a useful tool for addressing
similar challenges in other stochastic optimization problems.
Related papers
- Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - A Homogenization Approach for Gradient-Dominated Stochastic Optimization [6.1144486886258065]
We propose a homogeneous second-order descent method (SHSOD) for functions enjoying gradient dominance.
Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated optimization.
arXiv Detail & Related papers (2023-08-21T11:03:04Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Stochastic Compositional Gradient Descent under Compositional
constraints [13.170519806372075]
We study constrained optimization problems where the objective and constraint functions are convex and expressed as compositions of functions.
The problem arises in fair classification/regression and in the design of queuing systems.
We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely.
arXiv Detail & Related papers (2020-12-17T05:38:37Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.