Related papers: UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification

UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification

URL: http://arxiv.org/abs/2603.01348v1
Date: Mon, 02 Mar 2026 01:02:09 GMT
Title: UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification
Authors: Yessin Moakher, Youssef Attia El Hili, Vasilii Feofanov,
Abstract summary: We adapt DINOv2-style self-distillation to pretrain a time series foundation model.<n>We build on the Mantis tokenizer and transformer encoder architecture as our backbone.<n>Our method achieves state-of-the-art classification performance on both UCR and UEA benchmarks.
Score: 5.071106490524274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.

Related papers

Joint Embeddings Go Temporal [5.2741154046624255]
JointEmbedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space.<n>Time Series JEPA (TS-JEPA) is an architecture specifically adapted for time series representation learning.<n>We show that TS-JEPA can match or surpass current state-of-the-art baselines on different standard datasets.
arXiv Detail & Related papers (2025-09-29T19:57:37Z)
RATFM: Retrieval-augmented Time Series Foundation Model for Anomaly Detection [0.6524530902514115]
We propose a retrieval augmented time series foundation model (RATFM) to incorporate examples of test-time adaptation.<n>RATFM achieves a performance comparable to that of in-domain fine-tuning while avoiding domain-dependent fine-tuning.
arXiv Detail & Related papers (2025-06-02T10:25:35Z)
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning [79.44594332189018]
Class-Incremental Learning (CIL) seeks to develop models that continuously learn new action categories over time without previously acquired knowledge.<n>Existing approaches either rely on forgetting, raising concerns over memory and privacy, or adapt static image-based methods that neglect temporal modeling.<n>We propose a unified and exemplar-free VCIL framework that explicitly disentangles and preserves information.
arXiv Detail & Related papers (2025-05-20T06:46:51Z)
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.<n>Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.<n>We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z)
Towards Generalisable Time Series Understanding Across Domains [10.350643783811174]
We introduce a novel pre-training paradigm specifically designed to handle time series heterogeneity.<n>We propose a tokeniser with learnable domain signatures, a dual masking strategy, and a normalised cross-correlation loss.<n>Our code and pre-trained weights are available at https://www.oetu.com/oetu/otis.
arXiv Detail & Related papers (2024-10-09T17:09:30Z)
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks. This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z)
TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting [24.834846119163885]
We propose a novel framework, TEMPO, that can effectively learn time series representations. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains.
arXiv Detail & Related papers (2023-10-08T00:02:25Z)
Toward a Foundation Model for Time Series Data [34.1973242428317]
A foundation model is a machine learning model trained on a large and diverse set of data. We develop an effective time series foundation model by leveraging unlabeled samples from multiple domains.
arXiv Detail & Related papers (2023-10-05T21:44:50Z)
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning [54.90704746573636]
Our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2%$ of annotated frames. We experimentally validate that our method outperforms existing methods when working with as little as $2%$ of randomly chosen data.
arXiv Detail & Related papers (2020-11-03T17:35:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.