Learning Task-Agnostic Representations through Multi-Teacher Distillation
- URL: http://arxiv.org/abs/2510.18680v1
- Date: Tue, 21 Oct 2025 14:36:33 GMT
- Title: Learning Task-Agnostic Representations through Multi-Teacher Distillation
- Authors: Philippe Formont, Maxime Darrin, Banafsheh Karimian, Jackie CK Cheung, Eric Granger, Ismail Ben Ayed, Mohammadhadi Shateri, Pablo Piantanida,
- Abstract summary: We introduce a task-agnostic framework based on a majority vote" objective function.<n>We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings.<n>Our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks.
- Score: 59.488314181423284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.
Related papers
- Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning [7.412307614007383]
Multimodal learning models are designed to bridge different modalities, such as images and text, by learning a shared representation space.<n>These models often exhibit a modality gap, where different modalities occupy distinct regions within the shared representation space.<n>We identify the critical roles of mismatched data pairs and a learnable temperature parameter in causing and perpetuating the modality gap during training.
arXiv Detail & Related papers (2024-12-10T20:36:49Z) - Investigating Self-Supervised Methods for Label-Efficient Learning [27.029542823306866]
We study different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities.
We introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks.
When testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.
arXiv Detail & Related papers (2024-06-25T10:56:03Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - MT-SLVR: Multi-Task Self-Supervised Learning for Transformation
In(Variant) Representations [2.94944680995069]
We propose a multi-task self-supervised framework (MT-SLVR) that learns both variant and invariant features in a parameter-efficient manner.
We evaluate our approach on few-shot classification tasks drawn from a variety of audio domains and demonstrate improved classification performance.
arXiv Detail & Related papers (2023-05-29T09:10:50Z) - Leveraging sparse and shared feature activations for disentangled
representation learning [112.22699167017471]
We propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation.
We validate our approach on six real world distribution shift benchmarks, and different data modalities.
arXiv Detail & Related papers (2023-04-17T01:33:24Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - X-Learner: Learning Cross Sources and Tasks for Universal Visual
Representation [71.51719469058666]
We propose a representation learning framework called X-Learner.
X-Learner learns the universal feature of multiple vision tasks supervised by various sources.
X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z) - Representation Consolidation for Training Expert Students [54.90754502493968]
We show that a multi-head, multi-task distillation method is sufficient to consolidate representations from task-specific teacher(s) and improve downstream performance.
Our method can also combine the representational knowledge of multiple teachers trained on one or multiple domains into a single model.
arXiv Detail & Related papers (2021-07-16T17:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.