Related papers: A survey on Self Supervised learning approaches for improving Multimodal representation learning

A survey on Self Supervised learning approaches for improving Multimodal representation learning

URL: http://arxiv.org/abs/2210.11024v1
Date: Thu, 20 Oct 2022 05:19:49 GMT
Title: A survey on Self Supervised learning approaches for improving Multimodal representation learning
Authors: Naman Goyal
Abstract summary: This paper gives an overview for best self supervised learning approaches for multimodal learning. Cross modal generation, cross modal pretraining, cyclic translation, and generating unimodal labels in self supervised fashion are discussed.
Score: 13.581713668241552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently self supervised learning has seen explosive growth and use in variety of machine learning tasks because of its ability to avoid the cost of annotating large-scale datasets. This paper gives an overview for best self supervised learning approaches for multimodal learning. The presented approaches have been aggregated by extensive study of the literature and tackle the application of self supervised learning in different ways. The approaches discussed are cross modal generation, cross modal pretraining, cyclic translation, and generating unimodal labels in self supervised fashion.

Related papers

RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Machine Unlearning in Contrastive Learning [3.6218162133579694]
We introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models.
arXiv Detail & Related papers (2024-05-12T16:09:01Z)
Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs) Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations. We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z)
Towards Generalized Multi-stage Clustering: Multi-view Self-distillation [10.368796552760571]
Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task. This paper proposes a novel multi-stage deep MVC framework where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution.
arXiv Detail & Related papers (2023-10-29T03:35:34Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
Contrastive Learning with Cross-Modal Knowledge Mining for Multimodal Human Activity Recognition [1.869225486385596]
We explore the hypothesis that leveraging multiple modalities can lead to better recognition. We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition. We propose a flexible, general-purpose framework for performing multimodal self-supervised learning.
arXiv Detail & Related papers (2022-05-20T10:39:16Z)
Self-Supervised Representation Learning: Introduction, Advances and Challenges [125.38214493654534]
Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data.
arXiv Detail & Related papers (2021-10-18T13:51:22Z)
A Survey on Contrastive Self-supervised Learning [0.0]
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. Contrastive learning has recently become a dominant component in self-supervised learning methods for computer vision, natural language processing (NLP), and other domains. This paper provides an extensive review of self-supervised methods that follow the contrastive approach.
arXiv Detail & Related papers (2020-10-31T21:05:04Z)
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME) We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.