Self-Supervised Learning via Maximum Entropy Coding
- URL: http://arxiv.org/abs/2210.11464v1
- Date: Thu, 20 Oct 2022 17:58:30 GMT
- Title: Self-Supervised Learning via Maximum Entropy Coding
- Authors: Xin Liu, Zhongdao Wang, Yali Li, Shengjin Wang
- Abstract summary: We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
- Score: 57.56570417545023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A mainstream type of current self-supervised learning methods pursues a
general-purpose representation that can be well transferred to downstream
tasks, typically by optimizing on a given pretext task such as instance
discrimination. In this work, we argue that existing pretext tasks inevitably
introduce biases into the learned representation, which in turn leads to biased
transfer performance on various downstream tasks. To cope with this issue, we
propose Maximum Entropy Coding (MEC), a more principled objective that
explicitly optimizes on the structure of the representation, so that the
learned representation is less biased and thus generalizes better to unseen
downstream tasks. Inspired by the principle of maximum entropy in information
theory, we hypothesize that a generalizable representation should be the one
that admits the maximum entropy among all plausible representations. To make
the objective end-to-end trainable, we propose to leverage the minimal coding
length in lossy data coding as a computationally tractable surrogate for the
entropy, and further derive a scalable reformulation of the objective that
allows fast computation. Extensive experiments demonstrate that MEC learns a
more generalizable representation than previous methods based on specific
pretext tasks. It achieves state-of-the-art performance consistently on various
downstream tasks, including not only ImageNet linear probe, but also
semi-supervised classification, object detection, instance segmentation, and
object tracking. Interestingly, we show that existing batch-wise and
feature-wise self-supervised objectives could be seen equivalent to low-order
approximations of MEC. Code and pre-trained models are available at
https://github.com/xinliu20/MEC.
Related papers
- Transformers are Minimax Optimal Nonparametric In-Context Learners [36.291980654891496]
In-context learning of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples.
We develop approximation and generalization error bounds for a transformer composed of a deep neural network and one linear attention layer.
We show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context.
arXiv Detail & Related papers (2024-08-22T08:02:10Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - Efficient Iterative Amortized Inference for Learning Symmetric and
Disentangled Multi-Object Representations [8.163697683448811]
We introduce EfficientMORL, an efficient framework for the unsupervised learning of object-centric representations.
We show that optimization challenges caused by requiring both symmetry and disentanglement can be addressed by high-cost iterative amortized inference.
We demonstrate strong object decomposition and disentanglement on the standard multi-object benchmark while achieving nearly an order of magnitude faster training and test time inference.
arXiv Detail & Related papers (2021-06-07T14:02:49Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.