Related papers: SigMA: Path Signatures and Multi-head Attention for Learning Parameters in fBm-driven SDEs

SigMA: Path Signatures and Multi-head Attention for Learning Parameters in fBm-driven SDEs

URL: http://arxiv.org/abs/2512.15088v1
Date: Wed, 17 Dec 2025 05:09:18 GMT
Title: SigMA: Path Signatures and Multi-head Attention for Learning Parameters in fBm-driven SDEs
Authors: Xianglin Wu, Chiheb Ben Hammouda, Cornelis W. Oosterlee,
Abstract summary: SigMA learns model parameters from synthetically generated paths of fBm-driven differential equations.<n>It consistently outperforms CNN, LSTM, vanilla Transformer, and Deep Signature baselines in accuracy, robustness, and model compactness.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic differential equations (SDEs) driven by fractional Brownian motion (fBm) are increasingly used to model systems with rough dynamics and long-range dependence, such as those arising in quantitative finance and reliability engineering. However, these processes are non-Markovian and lack a semimartingale structure, rendering many classical parameter estimation techniques inapplicable or computationally intractable beyond very specific cases. This work investigates two central questions: (i) whether integrating path signatures into deep learning architectures can improve the trade-off between estimation accuracy and model complexity, and (ii) what constitutes an effective architecture for leveraging signatures as feature maps. We introduce SigMA (Signature Multi-head Attention), a neural architecture that integrates path signatures with multi-head self-attention, supported by a convolutional preprocessing layer and a multilayer perceptron for effective feature encoding. SigMA learns model parameters from synthetically generated paths of fBm-driven SDEs, including fractional Brownian motion, fractional Ornstein-Uhlenbeck, and rough Heston models, with a particular focus on estimating the Hurst parameter and on joint multi-parameter inference, and it generalizes robustly to unseen trajectories. Extensive experiments on synthetic data and two real-world datasets (i.e., equity-index realized volatility and Li-ion battery degradation) show that SigMA consistently outperforms CNN, LSTM, vanilla Transformer, and Deep Signature baselines in accuracy, robustness, and model compactness. These results demonstrate that combining signature transforms with attention-based architectures provides an effective and scalable framework for parameter inference in stochastic systems with rough or persistent temporal structure.

Related papers

Hierarchical Inference and Closure Learning via Adaptive Surrogates for ODEs and PDEs [15.38864225184245]
Inverse problems are the task of calibrating models to match data.<n>We develop a principled methodology for leveraging data from collections of distinct yet related physical systems.<n>We learn the shared unknown dynamics in the form of an ML-based closure model.
arXiv Detail & Related papers (2026-03-04T10:30:08Z)
Adapformer: Adaptive Channel Management for Multivariate Time Series Forecasting [49.40321003932633]
Adapformer is an advanced Transformer-based framework that merges the benefits of CI and CD methodologies through effective channel management.<n>Adapformer achieves superior performance over existing models, enhancing both predictive accuracy and computational efficiency.
arXiv Detail & Related papers (2025-11-18T16:24:05Z)
Merging Memory and Space: A State Space Neural Operator [8.378604588491394]
State Space Neural Operator (SS-NO) is a compact architecture for learning solution operators of time-dependent partial differential equations.<n>We show that SS-NO achieves state-of-the-art performance across diverse PDE benchmarks.
arXiv Detail & Related papers (2025-07-31T11:09:15Z)
Scalable Machine Learning Algorithms using Path Signatures [4.441866681085518]
This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines.<n>It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods.
arXiv Detail & Related papers (2025-06-21T08:36:34Z)
Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures.<n>By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets.<n>Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z)
Parameter Estimation of Long Memory Stochastic Processes with Deep Neural Networks [0.0]
We present a purely deep neural network-based approach for estimating long memory parameters of time series models. Parameters, such as the Hurst exponent, are critical in characterizing the long-range dependence, roughness, and self-similarity of processes.
arXiv Detail & Related papers (2024-10-03T03:14:58Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Learning minimal representations of stochastic processes with variational autoencoders [52.99137594502433]
We introduce an unsupervised machine learning approach to determine the minimal set of parameters required to describe a process. Our approach enables for the autonomous discovery of unknown parameters describing processes.
arXiv Detail & Related papers (2023-07-21T14:25:06Z)
Switching Autoregressive Low-rank Tensor Models [12.461139675114818]
We show how to switch autoregressive low-rank tensor (SALT) models. SALT parameterizes the tensor of an ARHMM with a low-rank factorization to control the number of parameters. We prove theoretical and discuss practical connections between SALT, linear dynamical systems, and SLDSs.
arXiv Detail & Related papers (2023-06-05T22:25:28Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Pretraining Without Attention [114.99187017618408]
This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs) BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.
arXiv Detail & Related papers (2022-12-20T18:50:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.