Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss
Policy for Transfer Learning
- URL: http://arxiv.org/abs/2212.11353v1
- Date: Wed, 21 Dec 2022 20:43:46 GMT
- Title: Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss
Policy for Transfer Learning
- Authors: Chris Lengerich, Gabriel Synnaeve, Amy Zhang, Hugh Leather, Kurt
Shuster, Fran\c{c}ois Charton, Charysse Redwood
- Abstract summary: We propose a self-supervised loss policy called contrastive distillation which manifests latent variables with high mutual information.
We show how this outperforms common methods of transfer learning and suggests a useful design axis of trading off compute for online transfer.
- Score: 20.76863234714442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional approaches to RL have focused on learning decision policies
directly from episodic decisions, while slowly and implicitly learning the
semantics of compositional representations needed for generalization. While
some approaches have been adopted to refine representations via auxiliary
self-supervised losses while simultaneously learning decision policies,
learning compositional representations from hand-designed and
context-independent self-supervised losses (multi-view) still adapts relatively
slowly to the real world, which contains many non-IID subspaces requiring rapid
distribution shift in both time and spatial attention patterns at varying
levels of abstraction. In contrast, supervised language model cascades have
shown the flexibility to adapt to many diverse manifolds, and hints of
self-learning needed for autonomous task transfer. However, to date, transfer
methods for language models like few-shot learning and fine-tuning still
require human supervision and transfer learning using self-learning methods has
been underexplored. We propose a self-supervised loss policy called contrastive
distillation which manifests latent variables with high mutual information with
both source and target tasks from weights to tokens. We show how this
outperforms common methods of transfer learning and suggests a useful design
axis of trading off compute for generalizability for online transfer.
Contrastive distillation is improved through sampling from memory and suggests
a simple algorithm for more efficiently sampling negative examples for
contrastive losses than random sampling.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - On minimal variations for unsupervised representation learning [19.055611167696238]
Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks.
Revealing minimal variations as a guiding principle behind unsupervised representation learning paves the way to better practical guidelines for self-supervised learning algorithms.
arXiv Detail & Related papers (2022-11-07T18:57:20Z) - Weakly-supervised Temporal Path Representation Learning with Contrastive
Curriculum Learning -- Extended Version [35.86394282979721]
A temporal path(TP) that includes temporal information, e.g., departure time, into the path is of fundamental to enable such applications.
Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks.
We propose a Weakly-Supervised Contrastive (WSC) learning model that encodes both the spatial and temporal information of a temporal path into a TPR.
arXiv Detail & Related papers (2022-03-30T07:36:20Z) - Learning from Heterogeneous Data Based on Social Interactions over
Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z) - Why Do Self-Supervised Models Transfer? Investigating the Impact of
Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images.
We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z) - Episodic Self-Imitation Learning with Hindsight [7.743320290728377]
Episodic self-imitation learning is a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function.
A selection module is introduced to filter uninformative samples from each episode of the update.
Episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces.
arXiv Detail & Related papers (2020-11-26T20:36:42Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.