Related papers: torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch

URL: http://arxiv.org/abs/2601.13994v1
Date: Tue, 20 Jan 2026 14:06:01 GMT
Title: torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch
Authors: Mingyuan Chi,
Abstract summary: We present torchsla, an open-source PyTorch library that enables GPU-accelerated, scalable, and differentiable sparse linear algebra.<n>torchsla supports multiple backends (SciPy, cuDSS, PyTorch-native) and seamlessly integrates with PyTorch autograd for end-to-end differentiable simulations.
Score: 0.2960141730774496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industrial scientific computing predominantly uses sparse matrices to represent unstructured data -- finite element meshes, graphs, point clouds. We present \torchsla{}, an open-source PyTorch library that enables GPU-accelerated, scalable, and differentiable sparse linear algebra. The library addresses three fundamental challenges: (1) GPU acceleration for sparse linear solves, nonlinear solves (Newton, Picard, Anderson), and eigenvalue computation; (2) Multi-GPU scaling via domain decomposition with halo exchange, reaching \textbf{400 million DOF linear solve on 3 GPUs}; and (3) Adjoint-based differentiation} achieving $\mathcal{O}(1)$ computational graph nodes (for autograd) and $\mathcal{O}(\text{nnz})$ memory -- independent of solver iterations. \torchsla{} supports multiple backends (SciPy, cuDSS, PyTorch-native) and seamlessly integrates with PyTorch autograd for end-to-end differentiable simulations. Code is available at https://github.com/walkerchi/torch-sla.

Related papers

scipy.spatial.transform: Differentiable Framework-Agnostic 3D Transformations in Python [49.015684860172975]
Three-dimensional rigid-body transforms, i.e. rotations and translations, are central to modern differentiable machine learning pipelines.<n> SciPy's spatial$.$diff module is a rigorously tested Python implementation.<n>We present a complete overhaul of SciPy's spatial$.$transform functionality that makes it compatible with any array library implementing the Python array API.
arXiv Detail & Related papers (2025-11-22T18:52:34Z)
Scalable GPU-Accelerated Euler Characteristic Curves: Optimization and Differentiable Learning for PyTorch [0.0]
We present optimized GPU kernels for the Euler Characteristic Curve (ECC) achieving 16-2000"O speedups over prior GPU implementations on synthetic grids.<n>We introduce a differentiable PyTorch layer enabling end-to-end learning.
arXiv Detail & Related papers (2025-10-23T06:59:07Z)
iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations [1.3030767447016454]
iSpLib is a PyTorch-based C++ library equipped with auto-tuned sparse operations. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU.
arXiv Detail & Related papers (2024-03-21T21:56:44Z)
Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks. We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest. We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z)
tntorch: Tensor Network Learning with PyTorch [26.544996974928583]
tntorch is a tensor learning framework that supports multiple decompositions. It implements differentiable tensor algebra, rank truncation, cross-approximation, batch processing, comprehensive tensor arithmetics, and more.
arXiv Detail & Related papers (2022-06-22T14:19:15Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity. We proposed a method to solve problems master to agents' communication and local calls. The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z)
Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.<n>Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z)
Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative GLASSO and Projection [58.5350491065936]
We consider a structural assumption on the graph Laplacian matrix $L$. The first $K$ eigenvectors of $L$ are pre-selected, e.g., based on domain-specific criteria. We design an efficient hybrid graphical lasso/projection algorithm to compute the most suitable graph Laplacian matrix $L* in H_u+$ given $barC$.
arXiv Detail & Related papers (2020-10-25T18:12:50Z)
Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z)
Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods. We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z)
Kernel Operations on the GPU, with Autodiff, without Memory Overflows [5.669790037378094]
The KeOps library provides a fast and memory-efficient GPU support for tensors whose entries are given by a mathematical formula. KeOps alleviates the major bottleneck of tensor-centric libraries for kernel and geometric applications: memory consumption. KeOps combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and PyTorch), Matlab and R.
arXiv Detail & Related papers (2020-03-27T08:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.