torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch
- URL: http://arxiv.org/abs/2601.13994v1
- Date: Tue, 20 Jan 2026 14:06:01 GMT
- Title: torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch
- Authors: Mingyuan Chi,
- Abstract summary: We present torchsla, an open-source PyTorch library that enables GPU-accelerated, scalable, and differentiable sparse linear algebra.<n>torchsla supports multiple backends (SciPy, cuDSS, PyTorch-native) and seamlessly integrates with PyTorch autograd for end-to-end differentiable simulations.
- Score: 0.2960141730774496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial scientific computing predominantly uses sparse matrices to represent unstructured data -- finite element meshes, graphs, point clouds. We present \torchsla{}, an open-source PyTorch library that enables GPU-accelerated, scalable, and differentiable sparse linear algebra. The library addresses three fundamental challenges: (1) GPU acceleration for sparse linear solves, nonlinear solves (Newton, Picard, Anderson), and eigenvalue computation; (2) Multi-GPU scaling via domain decomposition with halo exchange, reaching \textbf{400 million DOF linear solve on 3 GPUs}; and (3) Adjoint-based differentiation} achieving $\mathcal{O}(1)$ computational graph nodes (for autograd) and $\mathcal{O}(\text{nnz})$ memory -- independent of solver iterations. \torchsla{} supports multiple backends (SciPy, cuDSS, PyTorch-native) and seamlessly integrates with PyTorch autograd for end-to-end differentiable simulations. Code is available at https://github.com/walkerchi/torch-sla.
Related papers
- scipy.spatial.transform: Differentiable Framework-Agnostic 3D Transformations in Python [49.015684860172975]
Three-dimensional rigid-body transforms, i.e. rotations and translations, are central to modern differentiable machine learning pipelines.<n> SciPy's spatial$.$diff module is a rigorously tested Python implementation.<n>We present a complete overhaul of SciPy's spatial$.$transform functionality that makes it compatible with any array library implementing the Python array API.
arXiv Detail & Related papers (2025-11-22T18:52:34Z) - Scalable GPU-Accelerated Euler Characteristic Curves: Optimization and Differentiable Learning for PyTorch [0.0]
We present optimized GPU kernels for the Euler Characteristic Curve (ECC) achieving 16-2000"O speedups over prior GPU implementations on synthetic grids.<n>We introduce a differentiable PyTorch layer enabling end-to-end learning.
arXiv Detail & Related papers (2025-10-23T06:59:07Z) - iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations [1.3030767447016454]
iSpLib is a PyTorch-based C++ library equipped with auto-tuned sparse operations.
We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU.
arXiv Detail & Related papers (2024-03-21T21:56:44Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - tntorch: Tensor Network Learning with PyTorch [26.544996974928583]
tntorch is a tensor learning framework that supports multiple decompositions.
It implements differentiable tensor algebra, rank truncation, cross-approximation, batch processing, comprehensive tensor arithmetics, and more.
arXiv Detail & Related papers (2022-06-22T14:19:15Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.<n>Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z) - Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative
GLASSO and Projection [58.5350491065936]
We consider a structural assumption on the graph Laplacian matrix $L$.
The first $K$ eigenvectors of $L$ are pre-selected, e.g., based on domain-specific criteria.
We design an efficient hybrid graphical lasso/projection algorithm to compute the most suitable graph Laplacian matrix $L* in H_u+$ given $barC$.
arXiv Detail & Related papers (2020-10-25T18:12:50Z) - Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines.
The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z) - Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods.
We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z) - Kernel Operations on the GPU, with Autodiff, without Memory Overflows [5.669790037378094]
The KeOps library provides a fast and memory-efficient GPU support for tensors whose entries are given by a mathematical formula.
KeOps alleviates the major bottleneck of tensor-centric libraries for kernel and geometric applications: memory consumption.
KeOps combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and PyTorch), Matlab and R.
arXiv Detail & Related papers (2020-03-27T08:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.