Related papers: Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them

Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them

URL: http://arxiv.org/abs/2510.19634v1
Date: Wed, 22 Oct 2025 14:31:51 GMT
Title: Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them
Authors: Hrittik Roy, Søren Hauberg, Nicholas Krämer,
Abstract summary: This paper argues that the method of at least squares has significant unfulfilled potential in modern machine learning.<n>To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer.
Score: 17.808832664329426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabling many diverse applications. Empirically, we demonstrate: (i) scalability by enforcing weight sparsity on a 50 million parameter model; (ii) imposing conservativeness constraints in score-based generative models; and (iii) hyperparameter tuning of Gaussian processes based on predictive performance. By doing this, our work represents the next iteration in developing differentiable linear-algebra tools and making them widely accessible to machine learning practitioners.

Related papers

Online learning to accelerate nonlinear PDE solvers: applied to multiphase porous media flow [0.0]
We propose a novel type of nonlinear solver acceleration for systems of nonlinear partial differential equations (PDEs) that is based on online/adaptive learning.<n>The proposed method rely on four pillars: (i) dimensionless numbers as input parameters for the machine learning model, (ii) simplified numerical model (two-dimensional) for the offline training, (iii) dynamic control of a nonlinear solver tuning parameter (numerical relaxation), and (iv) and online learning for real-time improvement of the machine learning model.
arXiv Detail & Related papers (2025-04-25T15:15:45Z)
DimINO: Dimension-Informed Neural Operator Learning [41.37905663176428]
DimINO is a framework inspired by dimensional analysis.<n>It can be seamlessly integrated into existing neural operator architectures.<n>It achieves up to 76.3% performance gain on PDE datasets.
arXiv Detail & Related papers (2024-10-08T10:48:50Z)
Gradient Estimation and Variance Reduction in Stochastic and Deterministic Models [0.0]
This dissertation considers unconstrained, nonlinear optimization problems. We focus on the gradient itself, that key quantity which enables the solution of such problems. We present a new framework for calculating the gradient of problems involving both deterministic and elements.
arXiv Detail & Related papers (2024-05-14T14:41:58Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Multi-GPU Approach for Training of Graph ML Models on large CFD Meshes [0.0]
Mesh-based numerical solvers are an important part in many design tool chains. Machine Learning based surrogate models are fast in predicting approximate solutions but often lack accuracy. This paper scales a state-of-the-art surrogate model from the domain of graph-based machine learning to industry-relevant mesh sizes.
arXiv Detail & Related papers (2023-07-25T15:49:25Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Differentiable Spline Approximations [48.10988598845873]
Differentiable programming has significantly enhanced the scope of machine learning. Standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable. We show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications.
arXiv Detail & Related papers (2021-10-04T16:04:46Z)
Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Non-intrusive Nonlinear Model Reduction via Machine Learning Approximations to Low-dimensional Operators [0.0]
We propose a method that enables traditionally intrusive reduced-order models to be accurately approximated in a non-intrusive manner. The approach approximates the low-dimensional operators associated with projection-based reduced-order models (ROMs) using modern machine-learning regression techniques. In addition to enabling nonintrusivity, we demonstrate that the approach also leads to very low computational complexity, achieving up to $1000times$ reduction in run time.
arXiv Detail & Related papers (2021-06-17T17:04:42Z)
Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces. It uses latent variables to model generalizable learning patterns. At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.