Related papers: Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers

Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers

URL: http://arxiv.org/abs/2510.13444v1
Date: Wed, 15 Oct 2025 11:42:38 GMT
Title: Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
Authors: Nico Pelleriti, Christoph Spiegel, Shiwei Liu, David Martínez-Rubio, Max Zimmer, Sebastian Pokutta,
Abstract summary: Certifying nonnegativity of quadratics is a well-known NP-hard problem.<n>A sufficient condition for nonnegativity is the Sum of 200 Squares (SOS) property.<n>We introduce the first learning Transformer-training algorithm to certify SOS criterion.
Score: 33.99209000786153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Certifying nonnegativity of polynomials is a well-known NP-hard problem with direct applications spanning non-convex optimization, control, robotics, and beyond. A sufficient condition for nonnegativity is the Sum of Squares (SOS) property, i.e., it can be written as a sum of squares of other polynomials. In practice, however, certifying the SOS criterion remains computationally expensive and often involves solving a Semidefinite Program (SDP), whose dimensionality grows quadratically in the size of the monomial basis of the SOS expression; hence, various methods to reduce the size of the monomial basis have been proposed. In this work, we introduce the first learning-augmented algorithm to certify the SOS criterion. To this end, we train a Transformer model that predicts an almost-minimal monomial basis for a given polynomial, thereby drastically reducing the size of the corresponding SDP. Our overall methodology comprises three key components: efficient training dataset generation of over 100 million SOS polynomials, design and training of the corresponding Transformer architecture, and a systematic fallback mechanism to ensure correct termination, which we analyze theoretically. We validate our approach on over 200 benchmark datasets, achieving speedups of over $100\times$ compared to state-of-the-art solvers and enabling the solution of instances where competing approaches fail. Our findings provide novel insights towards transforming the practical scalability of SOS programming.

Related papers

Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We propose a new family of algorithms that uses the linear minimization oracle (LMO) to adapt to the geometry of the problem.<n>We demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
A practical, fast method for solving sum-of-squares problems for very large polynomials [10.645318208507213]
Sum of squares (SOS) optimization is a powerful technique for solving problems where the positivity of as must be enforced. Our goal is to devise an approach that can handle larger, more complex problems than is currently possible.
arXiv Detail & Related papers (2024-10-21T12:47:42Z)
A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal. Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z)
Sparse resultant based minimal solvers in computer vision and their connection with the action matrix [17.31412310131552]
We show that for some camera geometry problems our extra-based method leads to smaller and more stable solvers than the state-of-the-art Grobner basis-based solvers. It provides a competitive alternative to popularner basis-based methods for minimal problems in computer vision.
arXiv Detail & Related papers (2023-01-16T14:25:19Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Message Passing Neural PDE Solvers [60.77761603258397]
We build a neural message passing solver, replacing allally designed components in the graph with backprop-optimized neural function approximators. We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes. We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, equation parameters, discretizations, etc., in 1D and 2D.
arXiv Detail & Related papers (2022-02-07T17:47:46Z)
Deep Learning Approximation of Diffeomorphisms via Linear-Control Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls. We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z)
Recursive Causal Structure Learning in the Presence of Latent Variables and Selection Bias [27.06618125828978]
We consider the problem of learning the causal MAG of a system from observational data in the presence of latent variables and selection bias. We propose a novel computationally efficient constraint-based method that is sound and complete. We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world structures.
arXiv Detail & Related papers (2021-10-22T19:49:59Z)
A block-sparse Tensor Train Format for sample-efficient high-dimensional Polynomial Regression [0.0]
Low-rank tensors are an established framework for high-dimensionals problems. We propose to extend this framework by including the concept of block-sparsity. This allows us to adapt the ansatz space to align better with known sample results.
arXiv Detail & Related papers (2021-04-29T10:57:53Z)
Reduction of the Number of Variables in Parametric Constrained Least-Squares Problems [0.20305676256390928]
This paper proposes techniques for reducing the number of involved optimization variables. We show the good performance of the proposed techniques in numerical tests and in a linearized MPC problem of a nonlinear benchmark process.
arXiv Detail & Related papers (2020-12-18T18:26:40Z)
Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.