Related papers: Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions

Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions

URL: http://arxiv.org/abs/2506.05678v2
Date: Mon, 09 Jun 2025 03:50:59 GMT
Title: Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions
Authors: Haotian Jiang, Zeyu Bao, Shida Wang, Qianxiao Li,
Abstract summary: We propose a synthetic benchmarking framework to evaluate how effectively different sequence models capture distinct temporal structures.<n>The core of this approach is to generate synthetic targets, each characterized by a memory function and a parameter that determines the strength of temporal dependence.<n>Experiments on several sequence modeling architectures confirm existing theoretical insights and reveal new findings.
Score: 14.79659491236138
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The evolution of sequence modeling architectures, from recurrent neural networks and convolutional models to Transformers and structured state-space models, reflects ongoing efforts to address the diverse temporal dependencies inherent in sequential data. Despite this progress, systematically characterizing the strengths and limitations of these architectures remains a fundamental challenge. In this work, we propose a synthetic benchmarking framework to evaluate how effectively different sequence models capture distinct temporal structures. The core of this approach is to generate synthetic targets, each characterized by a memory function and a parameter that determines the strength of temporal dependence. This setup allows us to produce a continuum of tasks that vary in temporal complexity, enabling fine-grained analysis of model behavior concerning specific memory properties. We focus on four representative memory functions, each corresponding to a distinct class of temporal structures. Experiments on several sequence modeling architectures confirm existing theoretical insights and reveal new findings. These results demonstrate the effectiveness of the proposed method in advancing theoretical understanding and highlight the importance of using controllable targets with clearly defined structures for evaluating sequence modeling architectures.

Related papers

Multivariate Long-term Time Series Forecasting with Fourier Neural Filter [55.09326865401653]
We introduce FNF as the backbone and DBD as architecture to provide excellent learning capabilities and optimal learning pathways for spatial-temporal modeling.<n>We show that FNF unifies local time-domain and global frequency-domain information processing within a single backbone that extends naturally to spatial modeling.
arXiv Detail & Related papers (2025-06-10T18:40:20Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
Test-time regression: a unifying framework for designing sequence models with associative memory [24.915262407519876]
We introduce a unifying framework to understand and derive sequence models.<n>We formalize associative recall as a two-step process, memorization and retrieval, casting as a regression problem.<n>Our work bridges sequence modeling with classic regression methods, paving the way for developing more powerful and theoretically principled architectures.
arXiv Detail & Related papers (2025-01-21T18:32:31Z)
Behavioral Sequence Modeling with Ensemble Learning [8.241486511994202]
We present a framework for sequence modeling using Ensembles of Hidden Markov Models. Our ensemble-based scoring method enables robust comparison across sequences of different lengths. We demonstrate the effectiveness of our method with results on a longitudinal human behavior dataset.
arXiv Detail & Related papers (2024-11-04T15:34:28Z)
Benchmark on Drug Target Interaction Modeling from a Structure Perspective [48.60648369785105]
Drug-target interaction prediction is crucial to drug discovery and design. Recent methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets. We conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms.
arXiv Detail & Related papers (2024-07-04T16:56:59Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging [6.512092052306553]
Examples include M/EEG inverse problems, encoding neural models for task-based fMRI analyses, and temperature monitoring schemes. We devise a novel hierarchical flexible Bayesian framework within which the intrinsic-temporal dynamics of model parameters and noise are modeled.
arXiv Detail & Related papers (2021-11-02T15:50:01Z)
Approximation Theory of Convolutional Architectures for Time Series Modelling [15.42770933459534]
We study the approximation properties of convolutional architectures applied to time series modelling. Recent results reveal an intricate connection between approximation efficiency and memory structures in the data generation process.
arXiv Detail & Related papers (2021-07-20T09:19:26Z)
Redefining Neural Architecture Search of Heterogeneous Multi-Network Models by Characterizing Variation Operators and Model Components [71.03032589756434]
We investigate the effect of different variation operators in a complex domain, that of multi-network heterogeneous neural models. We characterize both the variation operators, according to their effect on the complexity and performance of the model; and the models, relying on diverse metrics which estimate the quality of the different parts composing it.
arXiv Detail & Related papers (2021-06-16T17:12:26Z)
Supporting Optimal Phase Space Reconstructions Using Neural Network Architecture for Time Series Modeling [68.8204255655161]
We propose an artificial neural network with a mechanism to implicitly learn the phase spaces properties. Our approach is either as competitive as or better than most state-of-the-art strategies.
arXiv Detail & Related papers (2020-06-19T21:04:47Z)
Semi-Structured Distributional Regression -- Extending Structured Additive Models by Arbitrary Deep Neural Networks and Data Modalities [0.0]
We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture. We demonstrate the framework's efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.
arXiv Detail & Related papers (2020-02-13T21:01:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.