Related papers: Scalable Machine Learning Algorithms using Path Signatures

Scalable Machine Learning Algorithms using Path Signatures

URL: http://arxiv.org/abs/2506.17634v2
Date: Tue, 24 Jun 2025 20:58:09 GMT
Title: Scalable Machine Learning Algorithms using Path Signatures
Authors: Csaba Tóth,
Abstract summary: This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines.<n>It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods.
Score: 4.441866681085518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature map for sequential and structured data. Rooted in rough path theory, path signatures are invariant to reparameterization and well-suited for modelling evolving dynamics, long-range dependencies, and irregular sampling - common challenges in real-world time series and graph data. This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines. It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods. Key contributions include: Gaussian processes with signature kernel-based covariance functions for uncertainty-aware time series modelling; the Seq2Tens framework, which employs low-rank tensor structure in the weight space for scalable deep modelling of long-range dependencies; and graph-based models where expected signatures over graphs induce hypo-elliptic diffusion processes, offering expressive yet tractable alternatives to standard graph neural networks. Further developments include Random Fourier Signature Features, a scalable kernel approximation with theoretical guarantees, and Recurrent Sparse Spectrum Signature Gaussian Processes, which combine Gaussian processes, signature kernels, and random features with a principled forgetting mechanism for multi-horizon time series forecasting with adaptive context length. We hope this thesis serves as both a methodological toolkit and a conceptual bridge, and provides a useful reference for the current state of the art in scalable, signature-based learning for sequential and structured data.

Related papers

Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures [1.024113475677323]
We propose a mathematical framework that encodes geometric and temporal properties of continuous paths to address this problem.<n>Path signatures provide a reparametrization-invariant characterization of dynamical data.<n>We propose a novel signature-based community detection algorithm, achieving exact recovery of structural communities from observed time series.
arXiv Detail & Related papers (2025-03-21T21:41:48Z)
Neural Descriptors: Self-Supervised Learning of Robust Local Surface Descriptors Using Polynomial Patches [11.450257794341052]
We present a self-supervised learning approach for extracting geometric features from 3D surfaces.<n>Our method combines synthetic data generation with a neural architecture designed to learn sampling-invariant features.
arXiv Detail & Related papers (2025-03-05T21:15:14Z)
Scalable Signature Kernel Computations for Long Time Series via Local Neumann Series Expansions [46.685771141109306]
The signature kernel is a recent state-of-the-art tool for analyzing high-dimensional sequential data.<n>We present a novel method for efficiently computing the signature kernel of long, high-dimensional time series.<n>This method achieves substantial performance improvements over state-of-the-art approaches for computing the signature kernel.
arXiv Detail & Related papers (2025-02-27T18:59:20Z)
Flow-based generative models as iterative algorithms in probability space [18.701755188870823]
Flow-based generative models offer exact likelihood estimation, efficient sampling, and deterministic transformations.<n>This tutorial presents an intuitive mathematical framework for flow-based generative models.<n>We aim to equip researchers and practitioners with the necessary tools to effectively apply flow-based generative models in signal processing and machine learning.
arXiv Detail & Related papers (2025-02-19T03:09:18Z)
Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review [59.856222854472605]
This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models.<n> practical applications in fields such as biology often require sample generation that maximizes specific metrics.<n>We discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, and (3) connections between inference-time algorithms in language models and diffusion models.
arXiv Detail & Related papers (2025-01-16T17:37:35Z)
Structure Learning in Gaussian Graphical Models from Glauber Dynamics [6.982878344925993]
We present the first algorithm for Gaussian model selection when data are sampled according to the Glauber dynamics.<n>We provide guarantees on the computational and statistical complexity of the proposed algorithm's structure learning performance.
arXiv Detail & Related papers (2024-12-24T18:49:13Z)
On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z)
Sparse Training of Discrete Diffusion Models for Graph Generation [45.103518022696996]
We introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse. By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network. Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets.
arXiv Detail & Related papers (2023-11-03T16:50:26Z)
Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations [57.15855198512551]
We propose a novel score-based generative model for graphs with a continuous-time framework. We show that our method is able to generate molecules that lie close to the training distribution yet do not violate the chemical valency rule.
arXiv Detail & Related papers (2022-02-05T08:21:04Z)
Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders. Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z)
Learned Factor Graphs for Inference from Stationary Time Sequences [107.63351413549992]
We propose a framework that combines model-based algorithms and data-driven ML tools for stationary time sequences. neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence. We present an inference algorithm based on learned stationary factor graphs, which learns to implement the sum-product scheme from labeled data.
arXiv Detail & Related papers (2020-06-05T07:06:19Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.