ViTs: Teaching Machines to See Time Series Anomalies Like Human Experts
- URL: http://arxiv.org/abs/2510.04710v1
- Date: Mon, 06 Oct 2025 11:24:53 GMT
- Title: ViTs: Teaching Machines to See Time Series Anomalies Like Human Experts
- Authors: Zexin Wang, Changhua Pei, Yang Liu, Hengyue Jiang, Quan Zhou, Haotian Si, Hang Cui, Jianhui Li, Gaogang Xie, Jingjing Li, Dan Pei,
- Abstract summary: "Train once, infer across scenarios" remains a fundamental challenge for time series anomaly detection models.<n>We propose ViTs, a Vision-Language Model (VLM)-based framework that converts time series curves into visual representations.
- Score: 21.498848897981173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Web service administrators must ensure the stability of multiple systems by promptly detecting anomalies in Key Performance Indicators (KPIs). Achieving the goal of "train once, infer across scenarios" remains a fundamental challenge for time series anomaly detection models. Beyond improving zero-shot generalization, such models must also flexibly handle sequences of varying lengths during inference, ranging from one hour to one week, without retraining. Conventional approaches rely on sliding-window encoding and self-supervised learning, which restrict inference to fixed-length inputs. Large Language Models (LLMs) have demonstrated remarkable zero-shot capabilities across general domains. However, when applied to time series data, they face inherent limitations due to context length. To address this issue, we propose ViTs, a Vision-Language Model (VLM)-based framework that converts time series curves into visual representations. By rescaling time series images, temporal dependencies are preserved while maintaining a consistent input size, thereby enabling efficient processing of arbitrarily long sequences without context constraints. Training VLMs for this purpose introduces unique challenges, primarily due to the scarcity of aligned time series image-text data. To overcome this, we employ an evolutionary algorithm to automatically generate thousands of high-quality image-text pairs and design a three-stage training pipeline consisting of: (1) time series knowledge injection, (2) anomaly detection enhancement, and (3) anomaly reasoning refinement. Extensive experiments demonstrate that ViTs substantially enhance the ability of VLMs to understand and detect anomalies in time series data. All datasets and code will be publicly released at: https://anonymous.4open.science/r/ViTs-C484/.
Related papers
- LEFT: Learnable Fusion of Tri-view Tokens for Unsupervised Time Series Anomaly Detection [53.191369031661885]
Unsupervised time series anomaly detection aims to build a model for identifying abnormal timestamps without assuming the availability of annotations.<n>We present Learnable Fusion of Tri-view Tokens (LEFT), a unified unsupervised TSAD framework that models anomalies as inconsistencies across complementary representations.<n>Experiments on real-world benchmarks show that LEFT yields the best detection accuracy against SOTA baselines, while achieving a 5x reduction on FLOPs and 8x speed-up for training.
arXiv Detail & Related papers (2026-02-09T13:33:49Z) - T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models [51.08566687549047]
We introduce Text-to-Series (T2S), a diffusion-based framework that bridges the gap between natural language and time series.<n>T2S employs a length-adaptive variational autoencoder to encode time series of varying lengths into consistent latent embeddings.<n>We train T2S in an interleaved paradigm across multiple lengths, allowing it to generate sequences of any desired length.
arXiv Detail & Related papers (2025-05-05T07:22:54Z) - VISTA: Unsupervised 2D Temporal Dependency Representations for Time Series Anomaly Detection [42.694234312755285]
Time Series Anomaly Detection (TSAD) is essential for uncovering rare and potentially harmful events in unlabeled time series data.<n>We introduce VISTA, a training-free, unsupervised TSAD algorithm designed to overcome these challenges.
arXiv Detail & Related papers (2025-04-03T11:20:49Z) - Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts [103.725112190618]
This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts.
Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
arXiv Detail & Related papers (2024-10-14T13:01:11Z) - Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models [0.0]
We propose an image-based, training-free time-series anomaly detection (ITF-TAD) approach.
ITF-TAD converts time-series data into images using wavelet transform and compresses them into a single representation, leveraging image foundation models for anomaly detection.
arXiv Detail & Related papers (2024-08-27T03:12:08Z) - TSLANet: Rethinking Transformers for Time Series Representation Learning [19.795353886621715]
Time series data is characterized by its intrinsic long and short-range dependencies.
We introduce a novel Time Series Lightweight Network (TSLANet) as a universal convolutional model for diverse time series tasks.
Our experiments demonstrate that TSLANet outperforms state-of-the-art models in various tasks spanning classification, forecasting, and anomaly detection.
arXiv Detail & Related papers (2024-04-12T13:41:29Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - EdgeConvFormer: Dynamic Graph CNN and Transformer based Anomaly
Detection in Multivariate Time Series [7.514010315664322]
We propose a novel anomaly detection method, named EdgeConvFormer, which integrates stacked Time2vec embedding, dynamic graph CNN, and Transformer to extract global and local spatial-time information.
Experiments demonstrate that EdgeConvFormer can learn the spatial-temporal modeling from multivariate time series data and achieve better anomaly detection performance than the state-of-the-art approaches on many real-world datasets of different scales.
arXiv Detail & Related papers (2023-12-04T08:38:54Z) - Large Language Models Are Zero-Shot Time Series Forecasters [48.73953666153385]
By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text.
We find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks.
arXiv Detail & Related papers (2023-10-11T19:01:28Z) - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems.
We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting.
Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z) - CARLA: Self-supervised Contrastive Representation Learning for Time Series Anomaly Detection [53.83593870825628]
One main challenge in time series anomaly detection (TSAD) is the lack of labelled data in many real-life scenarios.
Most of the existing anomaly detection methods focus on learning the normal behaviour of unlabelled time series in an unsupervised manner.
We introduce a novel end-to-end self-supervised ContrAstive Representation Learning approach for time series anomaly detection.
arXiv Detail & Related papers (2023-08-18T04:45:56Z) - HyperTime: Implicit Neural Representation for Time Series [131.57172578210256]
Implicit neural representations (INRs) have recently emerged as a powerful tool that provides an accurate and resolution-independent encoding of data.
In this paper, we analyze the representation of time series using INRs, comparing different activation functions in terms of reconstruction accuracy and training convergence speed.
We propose a hypernetwork architecture that leverages INRs to learn a compressed latent representation of an entire time series dataset.
arXiv Detail & Related papers (2022-08-11T14:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.