Computer Vision Self-supervised Learning Methods on Time Series
- URL: http://arxiv.org/abs/2109.00783v4
- Date: Fri, 26 Jan 2024 22:16:07 GMT
- Title: Computer Vision Self-supervised Learning Methods on Time Series
- Authors: Daesoo Lee, Erlend Aune
- Abstract summary: Self-supervised learning (SSL) has had great success in both computer vision.
Most of the current mainstream computer vision SSL frameworks are based on Siamese network architecture.
We show that the computer vision SSL frameworks can be effective even for time series.
- Score: 1.30536490219656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has had great success in both computer vision.
Most of the current mainstream computer vision SSL frameworks are based on
Siamese network architecture. These approaches often rely on cleverly crafted
loss functions and training setups to avoid feature collapse. In this study, we
evaluate if those computer-vision SSL frameworks are also effective on a
different modality (\textit{i.e.,} time series). The effectiveness is
experimented and evaluated on the UCR and UEA archives, and we show that the
computer vision SSL frameworks can be effective even for time series. In
addition, we propose a new method that improves on the recently proposed VICReg
method. Our method improves on a \textit{covariance} term proposed in VICReg,
and in addition we augment the head of the architecture by an iterative
normalization layer that accelerates the convergence of the model.
Related papers
- Scaling Language-Free Visual Representation Learning [62.31591054289958]
Visual Self-Supervised Learning (SSL) currently underperforms Contrastive Language-Image Pretraining (CLIP) in multimodal settings such as Visual Question Answering (VQA)
This multimodal gap is often attributed to the semantics introduced by language supervision, even though visual SSL and CLIP models are often trained on different data.
We study this question by training both visual SSL and CLIP models on the same MetaCLIP data, and leveraging VQA as a diverse testbed for vision encoders.
arXiv Detail & Related papers (2025-04-01T17:59:15Z) - Efficient Hierarchical Contrastive Self-supervising Learning for Time Series Classification via Importance-aware Resolution Selection [0.7373617024876725]
We propose an efficient way to train hierarchical contrastive learning models.
Inspired by the fact that each resolution's data embedding is highly dependent, we introduce importance-aware resolution selection based training framework.
arXiv Detail & Related papers (2025-02-14T21:32:50Z) - A New Perspective on Time Series Anomaly Detection: Faster Patch-based Broad Learning System [59.38402187365612]
Time series anomaly detection (TSAD) has been a research hotspot in both academia and industry in recent years.
Deep learning is not required for TSAD due to limitations such as slow deep learning speed.
We propose Contrastive Patch-based Broad Learning System (CBLS)
arXiv Detail & Related papers (2024-12-07T01:58:18Z) - Locality Alignment Improves Vision-Language Models [55.275235524659905]
Vision language models (VLMs) have seen growing adoption in recent years, but many still struggle with basic spatial reasoning errors.
We propose a new efficient post-training stage for ViTs called locality alignment.
We show that locality-aligned backbones improve performance across a range of benchmarks.
arXiv Detail & Related papers (2024-10-14T21:01:01Z) - Efficient infusion of self-supervised representations in Automatic Speech Recognition [1.2972104025246092]
Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks.
We propose two simple approaches that use framewise addition and (2) cross-attention mechanisms to efficiently incorporate the representations from the SSL model into the ASR architecture.
Our approach results in faster training and yields significant performance gains on the Librispeech and Tedlium datasets.
arXiv Detail & Related papers (2024-04-19T05:01:12Z) - What Constitutes Good Contrastive Learning in Time-Series Forecasting? [10.44543726728613]
Self-supervised contrastive learning (SSCL) has demonstrated remarkable improvements in representation learning across various domains.
This paper aims to conduct a comprehensive analysis of the effectiveness of various SSCL algorithms, learning strategies, model architectures, and their interplay.
We demonstrate that the end-to-end training of a Transformer model using the Mean Squared Error (MSE) loss and SSCL emerges as the most effective approach in time series forecasting.
arXiv Detail & Related papers (2023-06-21T08:05:05Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - Effective Self-supervised Pre-training on Low-compute Networks without
Distillation [6.530011859253459]
Reported performance of self-supervised learning has trailed behind standard supervised pre-training by a large margin.
Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks.
We take a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting.
arXiv Detail & Related papers (2022-10-06T10:38:07Z) - Pretraining the Vision Transformer using self-supervised methods for
vision based Deep Reinforcement Learning [0.0]
We study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess the quality of the learned representations.
Our results show that all methods are effective in learning useful representations and avoiding representational collapse.
The encoder pretrained with the temporal order verification task shows the best results across all experiments.
arXiv Detail & Related papers (2022-09-22T10:18:59Z) - Adapting Self-Supervised Vision Transformers by Probing
Attention-Conditioned Masking Consistency [7.940705941237998]
We propose PACMAC, a simple two-stage adaptation algorithm for self-supervised ViTs.
Our simple approach leads to consistent performance gains over competing methods.
arXiv Detail & Related papers (2022-06-16T14:46:10Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Learning Monocular Visual Odometry via Self-Supervised Long-Term
Modeling [106.15327903038705]
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation.
We present a self-supervised learning method for VO with special consideration for consistency over longer sequences.
We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO.
arXiv Detail & Related papers (2020-07-21T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.