A Geometry-Aware Efficient Algorithm for Compositional Entropic Risk Minimization
- URL: http://arxiv.org/abs/2602.02877v1
- Date: Mon, 02 Feb 2026 22:33:49 GMT
- Title: A Geometry-Aware Efficient Algorithm for Compositional Entropic Risk Minimization
- Authors: Xiyuan Wei, Linli Zhou, Bokun Wang, Chih-Jen Lin, Tianbao Yang,
- Abstract summary: We study a family of problems termed $textbfcompositional entropic risk minimization.<n>Each data's loss is formulated as a Log-Expectation-Exponential (Log-E-Exp) function.<n>We propose a geometry-aware algorithm, termed $textbfSCENT$, for the dual formulation of entropic risk minimization.
- Score: 42.094833715284416
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper studies optimization for a family of problems termed $\textbf{compositional entropic risk minimization}$, in which each data's loss is formulated as a Log-Expectation-Exponential (Log-E-Exp) function. The Log-E-Exp formulation serves as an abstraction of the Log-Sum-Exponential (LogSumExp) function when the explicit summation inside the logarithm is taken over a gigantic number of items and is therefore expensive to evaluate. While entropic risk objectives of this form arise in many machine learning problems, existing optimization algorithms suffer from several fundamental limitations including non-convergence, numerical instability, and slow convergence rates. To address these limitations, we propose a geometry-aware stochastic algorithm, termed $\textbf{SCENT}$, for the dual formulation of entropic risk minimization cast as a min--min optimization problem. The key to our design is a $\textbf{stochastic proximal mirror descent (SPMD)}$ update for the dual variable, equipped with a Bregman divergence induced by a negative exponential function that faithfully captures the geometry of the objective. Our main contributions are threefold: (i) we establish an $O(1/\sqrt{T})$ convergence rate of the proposed SCENT algorithm for convex problems; (ii) we theoretically characterize the advantages of SPMD over standard SGD update for optimizing the dual variable; and (iii) we demonstrate the empirical effectiveness of SCENT on extreme classification, partial AUC maximization, contrastive learning and distributionally robust optimization, where it consistently outperforms existing baselines.
Related papers
- Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation [5.754044493040163]
Natural Hypergradient Descent (NHGD) is a new method for solving bilevel optimization problems.<n>Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD.<n> Empirical evaluations on representative bilevel learning tasks demonstrate the practical advantages of NHGD.
arXiv Detail & Related papers (2026-02-11T14:31:33Z) - Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization [68.22688819802622]
We propose a new state-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution.<n>By applying our hinge-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution, we achieve a new state-of-the-art complexity of $O(/epsilon)$ for finding an (nearly) $'level KKT solution.
arXiv Detail & Related papers (2025-06-03T06:31:59Z) - Enhancing Distributional Robustness in Principal Component Analysis by Wasserstein Distances [7.695578200868269]
We consider the distributionally robust optimization (DRO) model of principal component analysis (PCA) to account for uncertainty in the underlying probability distribution.<n>The resulting formulation leads to a nonsmooth constrained min-max optimization problem, where the ambiguity set captures the distributional uncertainty by the type-$2$ Wasserstein distance.<n>This explicit characterization equivalently reformulates the original DRO model into a minimization problem on the Stiefel manifold with intricate nonsmooth terms.
arXiv Detail & Related papers (2025-03-04T11:00:08Z) - Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [9.788112471288057]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources.<n>We characterize optimal parametric solutions for the convex programming problem.<n>We show, by deriving necessary and sufficient conditions, that both schemes guarantee a globally optimal solution.
arXiv Detail & Related papers (2024-08-27T12:50:12Z) - Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions [26.543628010637036]
We introduce a novel adaptive reduction method that achieves an optimal convergence rate of $mathcalO(log T)$ for non- functions.
We also extend the proposed technique to obtain the same optimal rate of $mathcalO(log T)$ for compositional optimization.
arXiv Detail & Related papers (2024-06-04T04:39:51Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - FastPart: Over-Parameterized Stochastic Gradient Descent for Sparse optimisation on Measures [3.377298662011438]
This paper presents a novel algorithm that leverages Gradient Descent strategies in conjunction with Random Features to augment the scalability of Conic Particle Gradient Descent (CPGD)<n>We provide rigorous mathematical proofs demonstrating the following key findings: $mathrm(i)$ The total variation norms of the solution measures along the descent trajectory remain bounded, ensuring stability and preventing undesirable divergence; $mathrm(ii)$ We establish a global convergence guarantee with a convergence rate of $O(log(K)/sqrtK)$ over $K$, showcasing the efficiency and effectiveness of
arXiv Detail & Related papers (2023-12-10T20:41:43Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Convergence of Meta-Learning with Task-Specific Adaptation over Partial
Parameters [152.03852111442114]
Although model-agnostic metalearning (MAML) is a very successful algorithm meta-learning practice, it can have high computational complexity.
Our paper shows that such complexity can significantly affect the overall convergence performance of ANIL.
arXiv Detail & Related papers (2020-06-16T19:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.