GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs
- URL: http://arxiv.org/abs/2603.01306v1
- Date: Sun, 01 Mar 2026 22:26:09 GMT
- Title: GPU-friendly and Linearly Convergent First-order Methods for Certifying Optimal $k$-sparse GLMs
- Authors: Jiachang Liu, Andrea Lodi, Soroosh Shafiee,
- Abstract summary: Branch-and-Bound (BnB) frameworks can certify optimality using perspective relaxations.<n>Existing methods for solving these relaxations are computationally intensive, limiting their scalability.<n>We develop a unified proximal framework that is both linearly convergent and computationally efficient.
- Score: 7.079949618914198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of certifying optimality for sparse generalized linear models (GLMs), where sparsity is enforced through a cardinality constraint. While Branch-and-Bound (BnB) frameworks can certify optimality using perspective relaxations, existing methods for solving these relaxations are computationally intensive, limiting their scalability. To address this challenge, we reformulate the relaxations as composite optimization problems and develop a unified proximal framework that is both linearly convergent and computationally efficient. Under specific geometric regularity conditions, our analysis links primal quadratic growth to dual quadratic decay, yielding error bounds that make the Fenchel duality gap a sharp proxy for progress towards the solution set. This leads to a duality gap-based restart scheme that upgrades a broad class of sublinear proximal methods to provably linearly convergent methods, and applies beyond the sparse GLM setting. For the implicit perspective regularizer, we further derive specialized routines to evaluate the regularizer and its proximal operator exactly in log-linear time, avoiding costly generic conic solvers. The resulting iterations are dominated by matrix--vector multiplications, which enables GPU acceleration. Experiments on synthetic and real-world datasets show orders-of-magnitude faster dual-bound computations and substantially improved BnB scalability on large instances.
Related papers
- Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - Optimal Transportation and Alignment Between Gaussian Measures [80.4634530260329]
Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for datasets.<n>Because these frameworks are computationally expensive, large-scale applications often rely on closed-form solutions for Gaussian distributions under quadratic cost.<n>This work provides a comprehensive treatment of Gaussian, quadratic cost OT and inner product GW (IGW) alignment, closing several gaps in the literature to broaden applicability.
arXiv Detail & Related papers (2025-12-03T09:01:48Z) - Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints [3.567855687957749]
This work provides the first finite-time convergence guarantees for linearly constrained bilevel optimization using only first-order methods.<n>We address the unprecedented challenge of simultaneously handling linear constraints, noise, and finite-time analysis in bilevel optimization.
arXiv Detail & Related papers (2025-11-13T00:59:20Z) - Scalable First-order Method for Certifying Optimal k-Sparse GLMs [9.613635592922174]
We propose a first-order proximal gradient algorithm to solve the perspective relaxation of the problem within a BnB framework.<n>We show that our approach significantly accelerates dual bound computations and is highly effective in providing optimality certificates for large-scale problems.
arXiv Detail & Related papers (2025-02-13T17:14:18Z) - $\ell_0$-Regularized Quadratic Surface Support Vector Machines [0.0]
Kernel-free quadratic surface support vector machines have recently gained traction due to their flexibility in modeling nonlinear decision boundaries without relying on kernel functions.<n>We propose a sparse variant of the QSVM by enforcing a cardinality constraint on the model parameters.<n>We validate our approach on several real-world datasets, demonstrating its ability to reduce overfitting while maintaining strong classification performance.
arXiv Detail & Related papers (2025-01-20T04:26:34Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A Stochastic Composite Augmented Lagrangian Method For Reinforcement
Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning.
The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP.
A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.