Related papers: Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation

Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation

URL: http://arxiv.org/abs/2501.17737v2
Date: Wed, 11 Jun 2025 14:56:28 GMT
Title: Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation
Authors: Adrian Hill, Guillaume Dalle,
Abstract summary: Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML)<n>This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD)<n>We show significant speed-ups of up to three orders of magnitude on real-world problems from scientific ML, graph neural networks and optimization.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: From implicit differentiation to probabilistic modeling, Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML), but they are viewed as computationally prohibitive. Fortunately, these matrices often exhibit sparsity, which can be leveraged to speed up the process of Automatic Differentiation (AD). This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD). Our implementation of sparsity detection is based on operator overloading, able to detect both local and global sparsity patterns, and supports flexible index set representations. It is fully automatic and requires no modification of user code, making it compatible with existing ML codebases. Most importantly, it is highly performant, unlocking Jacobians and Hessians at scales where they were considered too expensive to compute. On real-world problems from scientific ML, graph neural networks and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, using our sparsity detection system, ASD outperforms standard AD for one-off computations, without amortization of either sparsity detection or matrix coloring.

Related papers

Periodic Online Testing for Sparse Systolic Tensor Arrays [0.0]
Modern Machine Learning (ML) applications often benefit from structured sparsity, a technique that efficiently reduces model complexity and simplifies handling of sparse data in hardware.<n>This paper introduces an online error-checking technique capable of detecting and locating permanent faults within sparse systolic tensor arrays before vectors begin.
arXiv Detail & Related papers (2025-04-25T18:10:45Z)
Sparser Training for On-Device Recommendation Systems [50.74019319100728]
We propose SparseRec, a lightweight embedding method based on Dynamic Sparse Training (DST) It avoids dense gradients during backpropagation by sampling a subset of important vectors.
arXiv Detail & Related papers (2024-11-19T03:48:48Z)
Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication [0.8363939984237685]
Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in scientific computing, graph analytics, and deep learning. Traditional hardware accelerators are tailored for specific sparsity patterns with fixed dataflow schemes. This paper presents a machine learning based approach for adaptively selecting the most appropriate dataflow scheme for SpGEMM tasks.
arXiv Detail & Related papers (2024-06-14T16:36:35Z)
Optimizing Automatic Differentiation with Deep Reinforcement Learning [0.9353041869660692]
We present a novel method to optimize the number of necessary multiplications for Jacobian computation by leveraging deep reinforcement learning (RL)<n>We show that this method achieves up to 33% improvements over state-of-the-art methods on several relevant tasks taken from diverse domains.
arXiv Detail & Related papers (2024-06-07T15:44:33Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Masked Matrix Multiplication for Emergent Sparsity [1.4786952412297807]
Transformer models exhibit emergent sparsity in which computations perform selective sparse access to dense data. We build a vectorized and parallel matrix-multiplication system A X B = C that eliminates unnecessary computations.
arXiv Detail & Related papers (2024-02-21T20:36:08Z)
Automatic Task Parallelization of Dataflow Graphs in ML/DL models [0.0]
We present a Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. We generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format. Preliminary results on several ML graphs demonstrate up to 1.9$times$ speedup over serial execution.
arXiv Detail & Related papers (2023-08-22T04:54:30Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Unified Functional Hashing in Automatic Machine Learning [58.77232199682271]
We show that large efficiency gains can be obtained by employing a fast unified functional hash. Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently. We show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery.
arXiv Detail & Related papers (2023-02-10T18:50:37Z)
Learning Decorrelated Representations Efficiently Using Fast Fourier Transform [3.932322649674071]
We propose a relaxed decorrelating regularizer that can be computed in O(n d log d) time by Fast Fourier Transform. The proposed regularizer exhibits accuracy comparable to that of existing regularizers in downstream tasks, whereas their training requires less memory and is faster for large d.
arXiv Detail & Related papers (2023-01-04T12:38:08Z)
Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. $nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z)
Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation [3.72826300260966]
We present an implementation of a CSR-based sparse matrix wrapper for PyTorch with acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
arXiv Detail & Related papers (2022-12-10T00:46:51Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Highly Parallel Autoregressive Entity Linking with Discriminative Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions. Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z)
Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z)
Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA) TRA is a set-based algebra based on the relational algebra. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.