The SSL Interplay: Augmentations, Inductive Bias, and Generalization
- URL: http://arxiv.org/abs/2302.02774v2
- Date: Thu, 1 Jun 2023 14:17:16 GMT
- Title: The SSL Interplay: Augmentations, Inductive Bias, and Generalization
- Authors: Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann LeCun,
Alberto Bietti
- Abstract summary: Self-supervised learning has emerged as a powerful framework to learn representations from raw data without supervision.
Yet in practice, engineers face issues such as instability in tunings and collapse of representations during training.
We propose a theory to shed light on complex interplay between data augmentation, network architecture, and training algorithm.
- Score: 24.787356572850317
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised learning (SSL) has emerged as a powerful framework to learn
representations from raw data without supervision. Yet in practice, engineers
face issues such as instability in tuning optimizers and collapse of
representations during training. Such challenges motivate the need for a theory
to shed light on the complex interplay between the choice of data augmentation,
network architecture, and training algorithm. We study such an interplay with a
precise analysis of generalization performance on both pretraining and
downstream tasks in a theory friendly setup, and highlight several insights for
SSL practitioners that arise from our theory.
Related papers
- An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research [25.564440860986757]
Self-Supervised Learning (SSL) powers many current AI systems.
Platonic view of SSL suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal.
We propose expanding Identifiability Theory (IT) into what we term Singular Identifiability Theory (SITh)
arXiv Detail & Related papers (2025-04-17T17:10:33Z) - Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.
It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.
Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)
Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z) - On the Discrimination and Consistency for Exemplar-Free Class Incremental Learning [19.898602404329697]
Exemplar-free class incremental learning (EF-CIL) is a nontrivial task that requires continuously enriching model capability with new classes while maintaining previously learned knowledge without storing and replaying any old class exemplars.
An emerging theory-guided framework for CIL trains task-specific models for a shared network, shifting the pressure of forgetting to task-id prediction.
In EF-CIL, task-id prediction is more challenging due to the lack of inter-task interaction (e.g., replays of exemplars)
arXiv Detail & Related papers (2025-01-26T08:50:33Z) - Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method [7.261306002808739]
We construct a theoretical analysis framework for prompt-based federated learning via feature learning theory.
Specifically, we monitor the evolution of signal learning and noise memorization in prompt-based federated learning.
We show that performance can be assessed by the ratio of task-relevant to task-irrelevant coefficients.
arXiv Detail & Related papers (2024-09-29T08:31:26Z) - Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning [15.78336840511033]
This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios.
Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates.
We employ a narrow bit-width encoded mask to compensate for the sparsification error without increasing the order of time complexity.
arXiv Detail & Related papers (2024-08-25T09:30:34Z) - Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Understanding Self-Supervised Learning of Speech Representation via
Invariance and Redundancy Reduction [0.45060992929802207]
Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data.
This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception.
arXiv Detail & Related papers (2023-09-07T10:23:59Z) - Reverse Engineering Self-Supervised Learning [17.720366509919167]
Self-supervised learning (SSL) is a powerful tool in machine learning.
This paper presents an in-depth empirical analysis of SSL-trained representations.
arXiv Detail & Related papers (2023-05-24T23:15:28Z) - ArCL: Enhancing Contrastive Learning with Augmentation-Robust
Representations [30.745749133759304]
We develop a theoretical framework to analyze the transferability of self-supervised contrastive learning.
We show that contrastive learning fails to learn domain-invariant features, which limits its transferability.
Based on these theoretical insights, we propose a novel method called Augmentation-robust Contrastive Learning (ArCL)
arXiv Detail & Related papers (2023-03-02T09:26:20Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training [52.93808218720784]
Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks.
Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tuning performance scales with pre-trained models.
We observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data.
arXiv Detail & Related papers (2021-08-25T02:29:28Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.