Related papers: MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation

MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation

URL: http://arxiv.org/abs/2303.12130v2
Date: Sun, 2 Jun 2024 15:50:37 GMT
Title: MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation
Authors: Vitaliy Kinakh, Mariia Drozdova, Slava Voloshynovskiy,
Abstract summary: We present a new method of self-supervised learning and knowledge distillation based on the multi-views and multi-representations (MV-MR) MV-MR is based on dependence between learnable embeddings from augmented and non-augmented views. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation.
Score: 4.156535226615695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a new method of self-supervised learning and knowledge distillation based on the multi-views and multi-representations (MV-MR). The MV-MR is based on the maximization of dependence between learnable embeddings from augmented and non-augmented views, jointly with the maximization of dependence between learnable embeddings from augmented view and multiple non-learnable representations from non-augmented view. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation. Unlike other self-supervised techniques, our approach does not use any contrastive learning, clustering, or stop gradients. MV-MR is a generic framework allowing the incorporation of constraints on the learnable embeddings via the usage of image multi-representations as regularizers. Along this line, knowledge distillation is considered a particular case of such a regularization. MV-MR provides the state-of-the-art performance on the STL10 and ImageNet-1K datasets among non-contrastive and clustering-free methods. We show that a lower complexity ResNet50 model pretrained using proposed knowledge distillation based on the CLIP ViT model achieves state-of-the-art performance on STL10 linear evaluation. The code is available at: https://github.com/vkinakh/mv-mr

Related papers

RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic Segmentation [43.991262005295596]
We introduce RS-MTDF (Multi-Teacher Distillation and Fusion), a novel framework to guide semi-supervised learning in remote sensing.<n> RS-MTDF employs multiple frozen Vision Foundation Models (VFMs) as expert teachers, utilizing feature-level distillation to align student features with their robust representations.<n>Our method outperforms existing approaches across various label ratios on LoveDA and secures the highest IoU in the majority of semantic categories.
arXiv Detail & Related papers (2025-06-10T13:15:15Z)
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Delving Deep into Semantic Relation Distillation [40.89593967999198]
This paper introduces a novel methodology, Semantics-based Relation Knowledge Distillation (SeRKD) SeRKD reimagines knowledge distillation through a semantics-relation lens among each sample. It integrates superpixel-based semantic extraction with relation-based knowledge distillation for a sophisticated model compression and distillation.
arXiv Detail & Related papers (2025-03-27T08:50:40Z)
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation [61.64052577026623]
Real-world multi-view datasets are often heterogeneous and imperfect. We propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment. In experiments, we employ it in unsupervised multi-view clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval.
arXiv Detail & Related papers (2025-03-06T07:01:08Z)
Multi-Level Decoupled Relational Distillation for Heterogeneous Architectures [6.231548250160585]
Multi-Level Decoupled Knowledge Distillation (MLDR-KD) improves student model performance with gains of up to 4.86% on CodeAR-100 and 2.78% on Tiny-ImageNet datasets respectively.
arXiv Detail & Related papers (2025-02-10T06:41:20Z)
Balanced Multi-view Clustering [56.17836963920012]
Multi-view clustering (MvC) aims to integrate information from different views to enhance the capability of the model in capturing the underlying data structures. The widely used joint training paradigm in MvC is potentially not fully leverage the multi-view information. We propose a novel balanced multi-view clustering (BMvC) method, which introduces a view-specific contrastive regularization (VCR) to modulate the optimization of each view.
arXiv Detail & Related papers (2025-01-05T14:42:47Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
Towards Generalized Multi-stage Clustering: Multi-view Self-distillation [10.368796552760571]
Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task. This paper proposes a novel multi-stage deep MVC framework where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution.
arXiv Detail & Related papers (2023-10-29T03:35:34Z)
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs) We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
Multimodal Distillation for Egocentric Action Recognition [41.821485757189656]
egocentric video understanding involves modelling hand-object interactions. Standard models, e.g. CNNs or Vision Transformers, which receive RGB frames as input perform well. But their performance improves further by employing additional input modalities that provide complementary cues. The goal of this work is to retain the performance of such a multimodal approach, while using only the RGB frames as input at inference time.
arXiv Detail & Related papers (2023-07-14T17:07:32Z)
Semi-supervised multi-view concept decomposition [30.699496411869834]
Concept Factorization (CF) has demonstrated superior performance in multi-view clustering tasks. We propose a novel semi-supervised multi-view concept factorization model, named SMVCF. We conduct experiments on four diverse datasets to evaluate the performance of SMVCF.
arXiv Detail & Related papers (2023-07-03T10:50:44Z)
Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo [18.52931570395043]
Supervised multi-view stereo (MVS) methods have achieved remarkable progress in terms of reconstruction quality, but suffer from the challenge of collecting large-scale ground-truth depth. We propose a novel self-supervised training pipeline for MVS based on knowledge distillation, termed KD-MVS.
arXiv Detail & Related papers (2022-07-21T11:41:53Z)
Domain-Agnostic Clustering with Self-Distillation [21.58831206727797]
We propose a new self-distillation based algorithm for domain-agnostic clustering. We empirically demonstrate that knowledge distillation can improve unsupervised representation learning. Preliminary experiments also suggest that self-distillation improves the convergence of DeepCluster-v2.
arXiv Detail & Related papers (2021-11-23T21:56:54Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME) We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.