Related papers: pathsig: A GPU-Accelerated Library for Truncated and Projected Path Signatures

pathsig: A GPU-Accelerated Library for Truncated and Projected Path Signatures

URL: http://arxiv.org/abs/2602.24066v1
Date: Fri, 27 Feb 2026 14:56:06 GMT
Title: pathsig: A GPU-Accelerated Library for Truncated and Projected Path Signatures
Authors: Tobias Nygaard,
Abstract summary: This paper introduces pathsig, a PyTorch-native library that computes path signatures directly in the word basis.<n>By using kernels to update signature coefficients in parallel over prefix-closed word sets, pathsig achieves high GPU throughput and near-minimal peak memory.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Path signatures provide a rich representation of sequential data, with strong theoretical guarantees and good performance in a variety of machine-learning tasks. While signatures have progressed from fixed feature extractors to trainable components of machine-learning models, existing libraries often lack the required scalability for large-scale, gradient-based learning. To address this gap, this paper introduces pathsig, a PyTorch-native library that computes path signatures directly in the word basis. By using CUDA kernels to update signature coefficients in parallel over prefix-closed word sets, pathsig achieves high GPU throughput and near-minimal peak memory. Compared with other libraries, pathsig achieves 10-30x speedups for computation of truncated signatures and up to 4-10x speedups in training that require backpropagation through the signature. Beyond regular truncation, pathsig supports projections of the (infinite-dimensional) signature onto user-specified sets of words and anisotropic truncation motivated by inhomogeneous path regularity, enabling more compact representations that can reduce dimensionality, redundancy, and computational cost.

Related papers

S$^3$-Attention:Attention-Aligned Endogenous Retrieval for Memory-Bounded Long-Context Inference [11.779449360037518]
We present S3-Attention, a memory-first inference-time framework that treats long-context processing as attention-aligned endogenous retrieval.<n>S3-Attention decodes transient key and query projections into top-k sparse feature identifiers using lightweight sparse autoencoders.<n>It constructs a CPU-based inverted index mapping features to token positions or spans during a single streaming scan.
arXiv Detail & Related papers (2026-01-25T05:25:22Z)
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers [36.26426380985327]
Diffusion Transformers (DiTs) set the state of the art in visual generation, yet their quadratic self-attention cost limits scaling to long token sequences.<n>Recent Top-K sparse attention approaches reduce the computation of DiTs by compressing tokens into block-wise representation.<n>We introduce Log-linear Sparse Attention (LLSA), a trainable sparse attention mechanism for extremely long token sequences.
arXiv Detail & Related papers (2025-12-18T14:53:12Z)
Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings [52.49524240846879]
We propose Hierarchical Token Prepending to mitigate attention-level compression and readout-level over-squashing.<n>HTP partitions the input into blocks and prepends block-level summary tokens to subsequent blocks, creating pathways for backward information flow.<n>As a simple, architecture-agnostic method, HTP enhances both zero-shot and finetuned models, offering a scalable route to superior long-document embeddings.
arXiv Detail & Related papers (2025-11-18T19:37:40Z)
dParallel: Learnable Parallel Decoding for dLLMs [77.24184219948337]
Diffusion large language models (dLLMs) offer parallel token prediction and lower inference latency.<n>Existing open-source models still require nearly token-length decoding steps to ensure performance.<n>We introduce dParallel, a simple and effective method that unlocks the inherent parallelism of dLLMs for fast sampling.
arXiv Detail & Related papers (2025-09-30T16:32:52Z)
pySigLib -- Fast Signature-Based Computations on CPU and GPU [9.126976857662084]
We present pySigLib, a high-performance Python library offering optimised implementations of signatures and signature kernels on CPU and GPU.<n>We introduce a novel differentiation scheme for signature kernels that delivers accurate gradients at a fraction of the runtime of existing libraries.
arXiv Detail & Related papers (2025-09-12T18:00:14Z)
Re-Densification Meets Cross-Scale Propagation: Real-Time Neural Compression of LiDAR Point Clouds [83.39320394656855]
LiDAR point clouds are fundamental to various applications, yet high-precision scans incur substantial storage and transmission overhead.<n>Existing methods typically convert unordered points into hierarchical octree or voxel structures for dense-to-sparse predictive coding.<n>Our framework comprises two lightweight modules. First, the Geometry Re-Densification Module re-densifies encoded sparse geometry, extracts features at denser scale, and then re-sparsifies the features for predictive coding.
arXiv Detail & Related papers (2025-08-28T06:36:10Z)
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3 [0.0]
Keras Sig is a high-performance pythonic library designed to compute path signature for deep learning applications.<n> Entirely built in Keras 3, textitKeras Sig leverages the seamless integration with the mostly used deep learning backends such as PyTorch, JAX and GPU.
arXiv Detail & Related papers (2025-01-14T22:00:01Z)
A User's Guide to $\texttt{KSig}$: GPU-Accelerated Computation of the Signature Kernel [12.111848705677138]
The signature kernel is a positive definite kernel for sequential and temporal data.<n>In this chapter, we give a short introduction to $textttKSig$, a $textttScikit-Learn$ compatible Python package that implements various GPU-accelerated algorithms for computing signature kernels.
arXiv Detail & Related papers (2025-01-13T09:11:13Z)
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling [24.04649159686283]
Speculative decoding is an approach to accelerate inference through a guess-and-verify paradigm.<n> Token Recycling stores candidate tokens in an adjacency matrix and employs a breadth-first-search algorithm.<n>It significantly outperforms existing train-free methods by 30% and even a widely recognized training method by 25%.
arXiv Detail & Related papers (2024-08-16T12:20:56Z)
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference [23.633481089469836]
Auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance.<n>We propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours.<n>Our approach demonstrates up to 2.49$times$ speedup and maintains a minimal memory overhead of just $0.0004$%.
arXiv Detail & Related papers (2024-05-28T22:19:30Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms. ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size. These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z)
High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.