UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification
- URL: http://arxiv.org/abs/2603.01348v1
- Date: Mon, 02 Mar 2026 01:02:09 GMT
- Title: UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification
- Authors: Yessin Moakher, Youssef Attia El Hili, Vasilii Feofanov,
- Abstract summary: We adapt DINOv2-style self-distillation to pretrain a time series foundation model.<n>We build on the Mantis tokenizer and transformer encoder architecture as our backbone.<n>Our method achieves state-of-the-art classification performance on both UCR and UEA benchmarks.
- Score: 5.071106490524274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.
Related papers
- Joint Embeddings Go Temporal [5.2741154046624255]
JointEmbedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space.<n>Time Series JEPA (TS-JEPA) is an architecture specifically adapted for time series representation learning.<n>We show that TS-JEPA can match or surpass current state-of-the-art baselines on different standard datasets.
arXiv Detail & Related papers (2025-09-29T19:57:37Z) - RATFM: Retrieval-augmented Time Series Foundation Model for Anomaly Detection [0.6524530902514115]
We propose a retrieval augmented time series foundation model (RATFM) to incorporate examples of test-time adaptation.<n>RATFM achieves a performance comparable to that of in-domain fine-tuning while avoiding domain-dependent fine-tuning.
arXiv Detail & Related papers (2025-06-02T10:25:35Z) - StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning [79.44594332189018]
Class-Incremental Learning (CIL) seeks to develop models that continuously learn new action categories over time without previously acquired knowledge.<n>Existing approaches either rely on forgetting, raising concerns over memory and privacy, or adapt static image-based methods that neglect temporal modeling.<n>We propose a unified and exemplar-free VCIL framework that explicitly disentangles and preserves information.
arXiv Detail & Related papers (2025-05-20T06:46:51Z) - UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.<n>Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.<n>We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z) - Towards Generalisable Time Series Understanding Across Domains [10.350643783811174]
We introduce a novel pre-training paradigm specifically designed to handle time series heterogeneity.<n>We propose a tokeniser with learnable domain signatures, a dual masking strategy, and a normalised cross-correlation loss.<n>Our code and pre-trained weights are available at https://www.oetu.com/oetu/otis.
arXiv Detail & Related papers (2024-10-09T17:09:30Z) - TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks.
This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z) - TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting [24.834846119163885]
We propose a novel framework, TEMPO, that can effectively learn time series representations.
TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains.
arXiv Detail & Related papers (2023-10-08T00:02:25Z) - Toward a Foundation Model for Time Series Data [34.1973242428317]
A foundation model is a machine learning model trained on a large and diverse set of data.
We develop an effective time series foundation model by leveraging unlabeled samples from multiple domains.
arXiv Detail & Related papers (2023-10-05T21:44:50Z) - Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior.
Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map.
KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Semi-supervised Facial Action Unit Intensity Estimation with Contrastive
Learning [54.90704746573636]
Our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2%$ of annotated frames.
We experimentally validate that our method outperforms existing methods when working with as little as $2%$ of randomly chosen data.
arXiv Detail & Related papers (2020-11-03T17:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.