PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision
- URL: http://arxiv.org/abs/2411.15127v2
- Date: Tue, 21 Jan 2025 13:55:17 GMT
- Title: PRIMUS: Pretraining IMU Encoders with Multimodal Self-Supervision
- Authors: Arnav M. Das, Chi Ian Tang, Fahim Kawsar, Mohammad Malekzadeh,
- Abstract summary: Unlabeled or weakly labeled IMU data can be used to model human motions.<n>We propose PRIMUS: a method for PRetraining IMU encoderS that uses a novel pretraining objective.<n> PRIMUS improves test accuracy by up to 15%, compared to state-of-the-art baselines.
- Score: 7.896850422430362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sensing human motions through Inertial Measurement Units (IMUs) embedded in personal devices has enabled significant applications in health and wellness. Labeled IMU data is scarce, however, unlabeled or weakly labeled IMU data can be used to model human motions. For video or text modalities, the "pretrain and adapt" approach utilizes large volumes of unlabeled or weakly labeled data to build a strong feature extractor, followed by adaptation to specific tasks using limited labeled data. However, pretraining methods are poorly understood for IMU data, and pipelines are rarely evaluated on out-of-domain tasks. We propose PRIMUS: a method for PRetraining IMU encoderS that uses a novel pretraining objective that is empirically validated based on downstream performance on both in-domain and out-of-domain datasets. The PRIMUS objective effectively enhances downstream performance by combining self-supervision, multimodal, and nearest-neighbor supervision. With fewer than 500 labeled samples per class, PRIMUS improves test accuracy by up to 15%, compared to state-of-the-art baselines. To benefit the broader community, we have open-sourced our code at github.com/nokia-bell-labs/pretrained-imu-encoders.
Related papers
- Saga: Capturing Multi-granularity Semantics from Massive Unlabelled IMU Data for User Perception [16.9766171115035]
In this paper, we propose a novel fine-grained user perception approach, called Saga, which only needs a small amount of labelled IMU data to achieve stunning user perception accuracy.
The core idea of Saga is to first pre-train a backbone feature extraction model, utilizing the rich semantic information of different levels embedded in the massive unlabelled IMU data.
Saga can achieve over 90% accuracy of the full-fledged model trained on over ten thousands training samples with no additional system overhead.
arXiv Detail & Related papers (2025-04-16T03:03:42Z) - Prism: Mining Task-aware Domains in Non-i.i.d. IMU Data for Flexible User Perception [20.61555898129175]
We propose a novel scheme, called Prism, which can obtain high FUP accuracy on mobile devices.
The core of Prism is to discover task-aware domains embedded in IMU dataset, and to train a domain-aware model on each identified domain.
Results demonstrate that Prism can achieve the best FUP performance with a low latency.
arXiv Detail & Related papers (2025-01-03T02:07:42Z) - Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models [0.2999888908665658]
Federated Learning (FL) has become an attractive approach to collaboratively train Machine Learning (ML) models.
This work proposes a novel unsupervised FL approach for the identification of potential misbehavior in vehicular environments.
We leverage the computing capabilities of public cloud services for model aggregation purposes.
arXiv Detail & Related papers (2024-05-16T08:49:50Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain
Adaptive Semantic Segmentation [4.082799056366928]
We propose an extensive cut-and-paste strategy (ECAP) to leverage reliable pseudo-labels through data augmentation.
ECAP maintains a memory bank of pseudo-labeled target samples throughout training and cut-and-pastes the most confident ones onto the current training batch.
We implement ECAP on top of the recent method MIC and boost its performance on two synthetic-to-real domain adaptation benchmarks.
arXiv Detail & Related papers (2024-03-06T17:06:07Z) - FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in
Computational Pathology [3.802258033231335]
Federated Multi-Modal (FedMM) is a learning framework that trains multiple single-modal feature extractors to enhance subsequent classification performance.
FedMM notably outperforms two baselines in accuracy and AUC metrics.
arXiv Detail & Related papers (2024-02-24T16:58:42Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training [20.98770732015944]
Few-shot intent detection involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data.
We show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected.
To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance.
arXiv Detail & Related papers (2023-06-08T15:26:52Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning
with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client.
We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling.
Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - IMG2IMU: Translating Knowledge from Large-Scale Images to IMU Sensing
Applications [6.865654843241631]
We propose IMG2IMU that adapts pre-trained representation from large-scale images to diverse IMU sensing tasks.
We convert the sensor data into visually interpretable spectrograms for the model to utilize the knowledge gained from vision.
IMG2IMU outperforms the baselines pre-trained on sensor data by an average of 9.6%p F1-score.
arXiv Detail & Related papers (2022-09-02T11:00:23Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Neural Semi-supervised Learning for Text Classification Under
Large-Scale Pretraining [51.19885385587916]
We conduct studies on semi-supervised learning in the task of text classification under the context of large-scale LM pretraining.
Our work marks an initial step in understanding the behavior of semi-supervised learning models under the context of large-scale pretraining.
arXiv Detail & Related papers (2020-11-17T13:39:05Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.