Related papers: Covariance-Aware Transformers for Quadratic Programming and Decision Making

Covariance-Aware Transformers for Quadratic Programming and Decision Making

URL: http://arxiv.org/abs/2602.14506v1
Date: Mon, 16 Feb 2026 06:39:24 GMT
Title: Covariance-Aware Transformers for Quadratic Programming and Decision Making
Authors: Kutay Tire, Yufan Zhang, Ege Onur Taga, Samet Oymak,
Abstract summary: We show that the linear attention mechanism can provably solve unconstrained QPs by tokenizing the matrix variables.<n>We introduce Time2Decide: a generic method that enhances a time series foundation model.<n>We find that Time2Decide uniformly outperforms the base TSFM model for the classical portfolio optimization problem.
Score: 25.76921238593292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore the use of transformers for solving quadratic programs and how this capability benefits decision-making problems that involve covariance matrices. We first show that the linear attention mechanism can provably solve unconstrained QPs by tokenizing the matrix variables (e.g.~$A$ of the objective $\frac{1}{2}x^\top Ax+b^\top x$) row-by-row and emulating gradient descent iterations. Furthermore, by incorporating MLPs, a transformer block can solve (i) $\ell_1$-penalized QPs by emulating iterative soft-thresholding and (ii) $\ell_1$-constrained QPs when equipped with an additional feedback loop. Our theory motivates us to introduce Time2Decide: a generic method that enhances a time series foundation model (TSFM) by explicitly feeding the covariance matrix between the variates. We empirically find that Time2Decide uniformly outperforms the base TSFM model for the classical portfolio optimization problem that admits an $\ell_1$-constrained QP formulation. Remarkably, Time2Decide also outperforms the classical "Predict-then-Optimize (PtO)" procedure, where we first forecast the returns and then explicitly solve a constrained QP, in suitable settings. Our results demonstrate that transformers benefit from explicit use of second-order statistics, and this can enable them to effectively solve complex decision-making problems, like portfolio construction, in one forward pass.

Related papers

Adaptive Matrix Online Learning through Smoothing with Guarantees for Nonsmooth Nonconvex Optimization [54.723834588133165]
We study online linear optimization with matrix variables by the operatorAML, a setting where the geometry renders designing datadependent and efficient adaptive algorithms challenging.<n>We instantiate this framework with two efficient methods that avoid projections.<n>We show both methods admit closed-form updates match one-sided Shampoo's regret up to a constant factor, while significantly reducing computational cost.
arXiv Detail & Related papers (2026-02-09T03:09:47Z)
Orthogonal Finetuning Made Scalable [92.34573849209238]
Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment.<n>We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity.<n>We propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic.<n>These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without
arXiv Detail & Related papers (2025-06-24T17:59:49Z)
Toward TransfORmers: Revolutionizing the Solution of Mixed Integer Programs with Transformers [3.107843027522116]
We introduce an innovative deep learning framework that employs a transformer model to address the challenges of mixed-integer programs. Our approach is the first to utilize transformers to predict the binary variables of a mixed-integer programming (MIP) problem. We present an efficient algorithm in which CLSP solutions are learned through a transformer neural network.
arXiv Detail & Related papers (2024-02-20T21:13:38Z)
Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem. We characterize the implicit bias of 1-layer transformers optimized with gradient descent. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z)
SCQPTH: an efficient differentiable splitting method for convex quadratic programming [1.3597551064547502]
SCQPTH is a differentiable first-order splitting method for convex quadratic programs. The SCQPTH software is made available as an open-source python package. For large scale QPs, SCQPTH can provide a $1times - 10times$ improvement in computational efficiency.
arXiv Detail & Related papers (2023-08-16T09:06:54Z)
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention. We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model. Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z)
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.<n>We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z)
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes [13.908826484332282]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem. This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z)
Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation [107.06364966905821]
We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of ofulq. We show that an $epsilon$-optimistic controller can be computed efficiently by solving at most $Obig(log (1/epsilon)big)$ Riccati equations.
arXiv Detail & Related papers (2020-07-13T16:30:47Z)
A conditional one-output likelihood formulation for multitask Gaussian processes [0.0]
Multitask Gaussian processes (MTGP) are the Gaussian process framework's solution for multioutput regression problems. Here we introduce a novel approach that simplifies the multitask learning. We show that it is computationally competitive with state of the art options.
arXiv Detail & Related papers (2020-06-05T14:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.