Beyond Just Vision: A Review on Self-Supervised Representation Learning
on Multimodal and Temporal Data
- URL: http://arxiv.org/abs/2206.02353v2
- Date: Wed, 8 Jun 2022 03:13:04 GMT
- Title: Beyond Just Vision: A Review on Self-Supervised Representation Learning
on Multimodal and Temporal Data
- Authors: Shohreh Deldari, Hao Xue, Aaqib Saeed, Jiayuan He, Daniel V. Smith,
Flora D. Salim
- Abstract summary: Popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training.
Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models.
We aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data.
- Score: 10.006890915441987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Self-Supervised Representation Learning (SSRL) has attracted much
attention in the field of computer vision, speech, natural language processing
(NLP), and recently, with other types of modalities, including time series from
sensors. The popularity of self-supervised learning is driven by the fact that
traditional models typically require a huge amount of well-annotated data for
training. Acquiring annotated data can be a difficult and costly process.
Self-supervised methods have been introduced to improve the efficiency of
training data through discriminative pre-training of models using supervisory
signals that have been freely obtained from the raw data. Unlike existing
reviews of SSRL that have pre-dominately focused upon methods in the fields of
CV or NLP for a single modality, we aim to provide the first comprehensive
review of multimodal self-supervised learning methods for temporal data. To
this end, we 1) provide a comprehensive categorization of existing SSRL
methods, 2) introduce a generic pipeline by defining the key components of a
SSRL framework, 3) compare existing models in terms of their objective
function, network architecture and potential applications, and 4) review
existing multimodal techniques in each category and various modalities.
Finally, we present existing weaknesses and future opportunities. We believe
our work develops a perspective on the requirements of SSRL in domains that
utilise multimodal and/or temporal data
Related papers
- An Information Criterion for Controlled Disentanglement of Multimodal Data [39.601584166020274]
Multimodal representation learning seeks to relate and decompose information inherent in multiple modalities.
Disentangled Self-Supervised Learning (DisentangledSSL) is a novel self-supervised approach for learning disentangled representations.
arXiv Detail & Related papers (2024-10-31T14:57:31Z) - Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available.
We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data.
Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - Ex-Model: Continual Learning from a Stream of Trained Models [12.27992745065497]
We argue that continual learning systems should exploit the availability of compressed information in the form of trained models.
We introduce and formalize a new paradigm named "Ex-Model Continual Learning" (ExML), where an agent learns from a sequence of previously trained models instead of raw data.
arXiv Detail & Related papers (2021-12-13T09:46:16Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - A Survey on Self-supervised Pre-training for Sequential Transfer
Learning in Neural Networks [1.1802674324027231]
Self-supervised pre-training for transfer learning is becoming an increasingly popular technique to improve state-of-the-art results using unlabeled data.
We provide an overview of the taxonomy for self-supervised learning and transfer learning, and highlight some prominent methods for designing pre-training tasks across different domains.
arXiv Detail & Related papers (2020-07-01T22:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.