Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization
- URL: http://arxiv.org/abs/2503.22923v1
- Date: Sat, 29 Mar 2025 01:01:02 GMT
- Title: Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization
- Authors: Yufeng Yang, Yi Zhou, Zhaosong Lu,
- Abstract summary: Distributionally robust shift optimization (DRO) is a powerful technique to robust models against data distribution.<n>This paper aims to solve regularized non DRO problems, where the uncertainty is modeled by a so-called generalized approximation function.
- Score: 4.989068568135242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic programming, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.
Related papers
- Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces [6.992239210938067]
We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous.
Recent research has explored learning the worst-case distribution using neural network-based generative networks.
This paper bridges this theoretical challenge by presenting an iterative algorithm to solve such a minimax problem.
arXiv Detail & Related papers (2024-12-29T19:31:23Z) - Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Flow-based Distributionally Robust Optimization [23.232731771848883]
We present a framework, called $textttFlowDRO$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets.
We aim to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it.
We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy.
arXiv Detail & Related papers (2023-10-30T03:53:31Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling.<n>This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.<n>We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Stochastic Constrained DRO with a Complexity Independent of Sample Size [38.56406595022129]
We propose and analyze algorithms that apply to both non- and convex losses for solving Kullback divergence constrained DRO problem.
We establish a nearly optimal complexity for finding an $$ilon stationary solution for non- losses and an optimal batch complexity for finding an optimal solution for broad applications.
arXiv Detail & Related papers (2022-10-11T19:11:19Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Sinkhorn Distributionally Robust Optimization [18.46110328123008]
Sinkhorn distance is a variant of Wasserstein distance based on entropic regularization.<n>We derive a convex programming dual reformulation for general nominal distributions, transport costs, and loss functions.
arXiv Detail & Related papers (2021-09-24T12:40:48Z) - Distributionally Robust Optimization with Markovian Data [8.126833795693699]
We study a program where the probability distribution of the uncertain problem parameters is unknown.
We propose a data-driven distributionally to estimate the problem's objective function and optimal solution.
arXiv Detail & Related papers (2021-06-12T10:59:02Z) - Bayesian multiscale deep generative model for the solution of
high-dimensional inverse problems [0.0]
A novel multiscale Bayesian inference approach is introduced based on deep probabilistic generative models.
The method allows high-dimensional parameter estimation while exhibiting stability, efficiency and accuracy.
arXiv Detail & Related papers (2021-02-04T11:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.