Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach
- URL: http://arxiv.org/abs/2601.11016v1
- Date: Fri, 16 Jan 2026 06:18:22 GMT
- Title: Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach
- Authors: Fenglin Zhang, Jie Wang,
- Abstract summary: We introduce a framework for contextual distributionally robust optimization (DRO)<n>We first introduce the causal Sinkhorn discrepancy (CSD), an entropy-regularized causal Wasserstein distance.<n>We derive a contextual DRO model with a CSD-based ambiguity set, termed Causal Sinkhorn DRO (Causal-SDRO)<n>We propose the Soft Forest Regression (SRF) decision rule, which approximates optimal policies within arbitrary measurable function spaces.
- Score: 2.8445258546547625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules that prescribe decisions using covariates. We first introduce the causal Sinkhorn discrepancy (CSD), an entropy-regularized causal Wasserstein distance that encourages continuous transport plans while preserving the causal consistency. We then formulate a contextual DRO model with a CSD-based ambiguity set, termed Causal Sinkhorn DRO (Causal-SDRO), and derive its strong dual reformulation where the worst-case distribution is characterized as a mixture of Gibbs distributions. To solve the corresponding infinite-dimensional policy optimization, we propose the Soft Regression Forest (SRF) decision rule, which approximates optimal policies within arbitrary measurable function spaces. The SRF preserves the interpretability of classical decision trees while being fully parametric, differentiable, and Lipschitz smooth, enabling intrinsic interpretation from both global and local perspectives. To solve the Causal-SDRO with parametric decision rules, we develop an efficient stochastic compositional gradient algorithm that converges to an $\varepsilon$-stationary point at a rate of $O(\varepsilon^{-4})$, matching the convergence rate of standard stochastic gradient descent. Finally, we validate our method through numerical experiments on synthetic and real-world datasets, demonstrating its superior performance and interpretability.
Related papers
- Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z) - On the Plasticity and Stability for Post-Training Large Language Models [54.757672540381236]
We identify a root cause as the conflict between plasticity and stability gradients.<n>We propose Probabilistic Conflict Resolution (PCR), a framework that models gradients as random variables.<n>PCR significantly smooths the training trajectory and achieves superior performance in various reasoning tasks.
arXiv Detail & Related papers (2026-02-06T07:31:26Z) - Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization [2.345146665577353]
Distributionally robust optimization (DRO) has emerged as a powerful paradigm for reliable decision-making under uncertainty.<n>This paper focuses on DRO with ambiguity sets defined via the Sinkhorn discrepancy: an entropy-regularized Wasserstein distance.<n>We study Sinkhorn DRO from the primal perspective, by reformulating it as a bilevel program with several infinite-dimensional lower-level subproblems over probability space.
arXiv Detail & Related papers (2025-12-14T04:42:51Z) - Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z) - Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.<n>Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Nested Stochastic Algorithm for Generalized Sinkhorn distance-Regularized Distributionally Robust Optimization [4.989068568135242]
Distributionally robust shift optimization (DRO) is a powerful technique to robust models against data distribution.<n>This paper aims to solve regularized non DRO problems, where the uncertainty is modeled by a so-called generalized approximation function.
arXiv Detail & Related papers (2025-03-29T01:01:02Z) - Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances [7.695578200868269]
We consider the distributionally robust optimization (DRO) model of principal component analysis (PCA) to account for uncertainty in the underlying probability distribution.<n>The resulting formulation leads to a nonsmooth constrained min-max optimization problem, where the ambiguity set captures the distributional uncertainty by the type-$2$ Wasserstein distance.<n>This explicit characterization equivalently reformulates the original DRO model into a minimization problem on the Stiefel manifold with intricate nonsmooth terms.
arXiv Detail & Related papers (2025-03-04T11:00:08Z) - Contextual Optimization under Covariate Shift: A Robust Approach by Intersecting Wasserstein Balls [18.047245099229325]
We propose a novel approach called Intersection Wasserstein-balls DRO (IW-DRO)<n>IW-DRO integrates multiple estimation methods into the distributionally robust optimization framework.<n>We show that IW-DRO achieves superior performance compared to single Wasserstein-ball DRO models.
arXiv Detail & Related papers (2024-06-04T15:46:41Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z) - Distributionally Robust Bayesian Optimization [121.71766171427433]
We present a novel distributionally robust Bayesian optimization algorithm (DRBO) for zeroth-order, noisy optimization.
Our algorithm provably obtains sub-linear robust regret in various settings.
We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2020-02-20T22:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.