Related papers: PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse

PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse

URL: http://arxiv.org/abs/2411.10087v1
Date: Fri, 15 Nov 2024 10:16:38 GMT
Title: PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse
Authors: Einari Vaaras, Manu Airaksinen, Okko Räsänen,
Abstract summary: Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML) PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings.
Score: 10.364808650788357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar pre-existing SSL method and competitive against the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse.

Related papers

Revisiting semi-supervised learning in the era of foundation models [28.414667991336067]
Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. We develop new SSL benchmark datasets where frozen vision foundation models (VFMs) underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels.
arXiv Detail & Related papers (2025-03-12T18:01:10Z)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z)
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning [0.0]
This study addresses the underutilization of abundant unlabeled data by employing self-supervised learning (SSL) to extract latent features from this data. Our experiments demonstrate that SSL significantly enhances the Transformer model's ability to learn representations. This approach holds promise for broader applications in PICU environments, where annotated data is often limited.
arXiv Detail & Related papers (2024-01-02T04:00:48Z)
Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction. Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data. Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z)
Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling [26.444935219428036]
Self-supervised learning (SSL) has emerged as a powerful technique for learning rich representations from unlabeled data. In real-world settings, spurious correlations between some attributes (e.g. race, gender and age) and labels for downstream tasks often exist. We propose a learning-speed aware SSL (LA-SSL) approach, in which we sample each training data with a probability that is inversely related to its learning speed.
arXiv Detail & Related papers (2023-11-27T22:52:45Z)
Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model. Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data. We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z)
CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking [11.616031590118014]
CroSSL allows for handling missing modalities and end-to-end cross-modal learning. We evaluate our method on a wide range of data, including motion sensors.
arXiv Detail & Related papers (2023-07-31T17:10:10Z)
Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects [84.6945070729684]
Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. This article reviews current state-of-the-art SSL methods for time series data.
arXiv Detail & Related papers (2023-06-16T18:23:10Z)
Self-Supervised PPG Representation Learning Shows High Inter-Subject Variability [3.8036939971290007]
We propose a Self-Supervised Learning (SSL) method with a pretext task of signal reconstruction to learn an informative generalized PPG representation. Results show that in a very limited label data setting (10 samples per class or less), using SSL is beneficial. SSL may pave the way for the broader use of machine learning models on PPG data in label-scarce regimes.
arXiv Detail & Related papers (2022-12-07T19:02:45Z)
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning [110.40285771431687]
Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes.
arXiv Detail & Related papers (2022-07-05T18:51:05Z)
Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem. We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z)
Adaptive neighborhood Metric learning [184.95321334661898]
We propose a novel distance metric learning algorithm, named adaptive neighborhood metric learning (ANML) ANML can be used to learn both the linear and deep embeddings. The emphlog-exp mean function proposed in our method gives a new perspective to review the deep metric learning methods.
arXiv Detail & Related papers (2022-01-20T17:26:37Z)
Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning [59.58381904522967]
We propose a novel embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Our method can be easily extended to transductive ZSL setting by generating labels for unseen images.
arXiv Detail & Related papers (2020-09-16T03:54:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.