Related papers: VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection

VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection

URL: http://arxiv.org/abs/2602.16681v1
Date: Wed, 18 Feb 2026 18:22:22 GMT
Title: VETime: Vision Enhanced Zero-Shot Time Series Anomaly Detection
Authors: Yingyuan Yang, Tian Lan, Yifei Gao, Yimeng Lu, Wenjun He, Meng Wang, Chenghao Liu, Chen Zhang,
Abstract summary: Time-series anomaly detection (TSAD) requires identifying both immediate Point Anomalies and long-range Context Anomalies.<n>We propose VETime, the first TSAD framework that unifies temporal and visual modalities through fine-grained visual-temporal alignment and dynamic fusion.<n> VETime significantly outperforms state-of-the-art models in zero-shot scenarios, achieving superior localization precision with lower computational overhead than current vision-based approaches.
Score: 36.10754425277683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Time-series anomaly detection (TSAD) requires identifying both immediate Point Anomalies and long-range Context Anomalies. However, existing foundation models face a fundamental trade-off: 1D temporal models provide fine-grained pointwise localization but lack a global contextual perspective, while 2D vision-based models capture global patterns but suffer from information bottlenecks due to a lack of temporal alignment and coarse-grained pointwise detection. To resolve this dilemma, we propose VETime, the first TSAD framework that unifies temporal and visual modalities through fine-grained visual-temporal alignment and dynamic fusion. VETime introduces a Reversible Image Conversion and a Patch-Level Temporal Alignment module to establish a shared visual-temporal timeline, preserving discriminative details while maintaining temporal sensitivity. Furthermore, we design an Anomaly Window Contrastive Learning mechanism and a Task-Adaptive Multi-Modal Fusion to adaptively integrate the complementary perceptual strengths of both modalities. Extensive experiments demonstrate that VETime significantly outperforms state-of-the-art models in zero-shot scenarios, achieving superior localization precision with lower computational overhead than current vision-based approaches. Code available at: https://github.com/yyyangcoder/VETime.

Related papers

TimeOmni-VL: Unified Models for Time Series Understanding and Generation [66.55423802406078]
Time Omni-VL is a vision-centric framework that unifies time series understanding and generation.<n>Time Omni-VL is the first to leverage time series understanding as an explicit control signal for high-fidelity generation.<n> Experiments confirm that this unified approach significantly improves both semantic understanding and numerical precision.
arXiv Detail & Related papers (2026-02-19T07:50:11Z)
DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series [22.29889788385778]
Multi-dimensional time series anomaly (MTSAD) aims to accurately identify and localize complex abnormal patterns in large-scale industrial control systems.<n>Existing approaches excel in recognizing distinct patterns under low representations, but fail to robustly capture long-range dependencies when learning from the high-dimensional time series.
arXiv Detail & Related papers (2025-12-14T07:40:23Z)
Temporal-Visual Semantic Alignment: A Unified Architecture for Transferring Spatial Priors from Vision Models to Zero-Shot Temporal Tasks [19.299293037292113]
TimeArtist is a temporal-visual conversion framework that pioneers semantic-level alignment between time series fluctuations and visual concepts.<n>Our work establishes a new paradigm for cross-modal generation, bridging the gap between temporal dynamics and visual semantics.
arXiv Detail & Related papers (2025-11-25T02:35:48Z)
MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection [94.12444452690329]
This paper presents MambaTAD, a new state-space TAD model that introduces long-range modeling and global feature detection capabilities.<n>MambaTAD achieves superior TAD performance consistently across multiple public benchmarks.
arXiv Detail & Related papers (2025-11-22T06:04:29Z)
DAMS:Dual-Branch Adaptive Multiscale Spatiotemporal Framework for Video Anomaly Detection [7.117824587276951]
This study offers a dual-path architecture called the Dual-Branch Adaptive Multiscale Stemporal Framework (DAMS), which is based on multilevel feature and decoupling fusion.<n>The main processing path integrates the Adaptive Multiscale Time Pyramid Network (AMTPN) with the Convolutional Block Attention Mechanism (CBAM)
arXiv Detail & Related papers (2025-07-28T08:42:00Z)
OptiCorNet: Optimizing Sequence-Based Context Correlation for Visual Place Recognition [2.3093110834423616]
This paper presents OptiCorNet, a novel sequence modeling framework.<n>It unifies spatial feature extraction and temporal differencing into a differentiable, end-to-end trainable module.<n>Our approach outperforms state-of-the-art baselines under challenging seasonal and viewpoint variations.
arXiv Detail & Related papers (2025-07-19T04:29:43Z)
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion. Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z)
USTEP: Spatio-Temporal Predictive Learning under A Unified View [62.58464029270846]
We introduce USTEP (Unified S-TEmporal Predictive learning), an innovative framework that reconciles the recurrent-based and recurrent-free methods by integrating both micro-temporal and macro-temporal scales.
arXiv Detail & Related papers (2023-10-09T16:17:42Z)
CARLA: Self-supervised Contrastive Representation Learning for Time Series Anomaly Detection [53.83593870825628]
One main challenge in time series anomaly detection (TSAD) is the lack of labelled data in many real-life scenarios. Most of the existing anomaly detection methods focus on learning the normal behaviour of unlabelled time series in an unsupervised manner. We introduce a novel end-to-end self-supervised ContrAstive Representation Learning approach for time series anomaly detection.
arXiv Detail & Related papers (2023-08-18T04:45:56Z)
Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention [9.837958401514141]
We present CTA-Depth, a Context-aware Temporal Attention guided network for multi-frame monocular Depth estimation. Our approach achieves significant improvements over state-of-the-art approaches on three benchmark datasets.
arXiv Detail & Related papers (2023-05-12T11:48:32Z)
Local-Global Temporal Difference Learning for Satellite Video Super-Resolution [53.03380679343968]
We propose to exploit the well-defined temporal difference for efficient and effective temporal compensation.<n>To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies.<n> Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-10T07:04:40Z)
Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision. This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.