Related papers: A statistical perspective on transformers for small longitudinal cohort data

A statistical perspective on transformers for small longitudinal cohort data

URL: http://arxiv.org/abs/2602.16914v1
Date: Wed, 18 Feb 2026 22:03:59 GMT
Title: A statistical perspective on transformers for small longitudinal cohort data
Authors: Kiana Farhadyar, Maren Hackenberg, Kira Ahrens, Charlotte Schenk, Bianca Kollmann, Oliver Tüscher, Klaus Lieb, Michael M. Plichta, Andreas Reif, Raffael Kalisch, Martin Wolkewitz, Moritz Hess, Harald Binder,
Abstract summary: We present a simplified transformer architecture that retains the core attention mechanism while reducing the number of parameters to be estimated.<n>We use an autoregressive model as a starting point and incorporate attention as a kernel-based operation with temporal decay.<n>This also enables a permutation-based statistical testing procedure for identifying contextual patterns.
Score: 0.058079925240809634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling of longitudinal cohort data typically involves complex temporal dependencies between multiple variables. There, the transformer architecture, which has been highly successful in language and vision applications, allows us to account for the fact that the most recently observed time points in an individual's history may not always be the most important for the immediate future. This is achieved by assigning attention weights to observations of an individual based on a transformation of their values. One reason why these ideas have not yet been fully leveraged for longitudinal cohort data is that typically, large datasets are required. Therefore, we present a simplified transformer architecture that retains the core attention mechanism while reducing the number of parameters to be estimated, to be more suitable for small datasets with few time points. Guided by a statistical perspective on transformers, we use an autoregressive model as a starting point and incorporate attention as a kernel-based operation with temporal decay, where aggregation of multiple transformer heads, i.e. different candidate weighting schemes, is expressed as accumulating evidence on different types of underlying characteristics of individuals. This also enables a permutation-based statistical testing procedure for identifying contextual patterns. In a simulation study, the approach is shown to recover contextual dependencies even with a small number of individuals and time points. In an application to data from a resilience study, we identify temporal patterns in the dynamics of stress and mental health. This indicates that properly adapted transformers can not only achieve competitive predictive performance, but also uncover complex context dependencies in small data settings.

Related papers

Patch-Level Tokenization with CNN Encoders and Attention for Improved Transformer Time-Series Forecasting [0.0]
This paper proposes a two-stage forecasting framework that separates local temporal representation learning from global dependency modelling.<n>A convolutional neural network operates on fixed-length temporal patches to extract short-range temporal dynamics and non-linear feature interactions.<n> Token-level self-attention is applied during representation learning to refine these embeddings, after which a Transformer encoder models inter-patch temporal dependencies to generate forecasts.
arXiv Detail & Related papers (2026-01-18T16:16:01Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay.<n>Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks.<n>Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting [10.32586981170693]
Inverted Seasonal-Trend Decomposition Transformer (Ister)<n>We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy.<n>Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.
arXiv Detail & Related papers (2024-12-25T06:37:19Z)
EDformer: Embedded Decomposition Transformer for Interpretable Multivariate Time Series Predictions [4.075971633195745]
This paper introduces an embedded transformer, 'EDformer', for time series forecasting tasks.<n>Without altering the fundamental elements, we reuse the Transformer architecture and consider the capable functions of its constituent parts.<n>The model obtains state-of-the-art predicting results in terms of accuracy and efficiency on complex real-world time series datasets.
arXiv Detail & Related papers (2024-12-16T11:13:57Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Rethinking Urban Mobility Prediction: A Super-Multivariate Time Series Forecasting Approach [71.67506068703314]
Long-term urban mobility predictions play a crucial role in the effective management of urban facilities and services. Traditionally, urban mobility data has been structured as videos, treating longitude and latitude as fundamental pixels. In our research, we introduce a fresh perspective on urban mobility prediction. Instead of oversimplifying urban mobility data as traditional video data, we regard it as a complex time series.
arXiv Detail & Related papers (2023-12-04T07:39:05Z)
TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare [14.14872125241069]
We present Timely Generative Pre-trained Transformer (TimelyGPT) TimelyGPT employs an extrapolatable position (xPos) embedding to encode trend and periodic patterns into time-series representations. It also integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies.
arXiv Detail & Related papers (2023-11-29T19:09:28Z)
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [62.40166958002558]
We propose iTransformer, which simply applies the attention and feed-forward network on the inverted dimensions. The iTransformer model achieves state-of-the-art on challenging real-world datasets.
arXiv Detail & Related papers (2023-10-10T13:44:09Z)
AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme [0.7216399430290167]
Anomaly detection task for time series, especially for unlabeled data, has been a challenging problem. We address it by applying a suitable data degradation scheme to self-supervised model training. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context.
arXiv Detail & Related papers (2023-05-08T05:42:24Z)
DuETT: Dual Event Time Transformer for Electronic Health Records [14.520791492631114]
We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length. Our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.
arXiv Detail & Related papers (2023-04-25T17:47:48Z)
Causal Transformer for Estimating Counterfactual Outcomes [18.640006398066188]
Estimating counterfactual outcomes over time from observational data is relevant for many applications. We develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders.
arXiv Detail & Related papers (2022-04-14T22:40:09Z)
TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. We propose a versatile method that estimates joint distributions using an attention-based decoder. We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies. THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.