Convergence of End-to-End Training in Deep Unsupervised Contrastive
Learning
- URL: http://arxiv.org/abs/2002.06979v3
- Date: Sun, 30 May 2021 17:23:28 GMT
- Title: Convergence of End-to-End Training in Deep Unsupervised Contrastive
Learning
- Authors: Zixin Wen
- Abstract summary: Unsupervised contrastive learning has proven to be a powerful method for learning representations from unlabeled data.
This study provides theoretical insights into the practical success of these unsupervised methods.
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised contrastive learning has gained increasing attention in the
latest research and has proven to be a powerful method for learning
representations from unlabeled data. However, little theoretical analysis was
known for this framework. In this paper, we study the optimization of deep
unsupervised contrastive learning. We prove that, by applying end-to-end
training that simultaneously updates two deep over-parameterized neural
networks, one can find an approximate stationary solution for the non-convex
contrastive loss. This result is inherently different from the existing
over-parameterized analysis in the supervised setting because, in contrast to
learning a specific target function, unsupervised contrastive learning tries to
encode the unlabeled data distribution into the neural networks, which
generally has no optimal solution. Our analysis provides theoretical insights
into the practical success of these unsupervised pretraining methods.
Related papers
- Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory.
We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z) - Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Leveraging Unlabeled Data for 3D Medical Image Segmentation through
Self-Supervised Contrastive Learning [3.7395287262521717]
Current 3D semi-supervised segmentation methods face significant challenges such as limited consideration of contextual information.
We introduce two distinctworks designed to explore and exploit the discrepancies between them, ultimately correcting the erroneous prediction results.
We employ a self-supervised contrastive learning paradigm to distinguish between reliable and unreliable predictions.
arXiv Detail & Related papers (2023-11-21T14:03:16Z) - An Analytic End-to-End Deep Learning Algorithm based on Collaborative
Learning [5.710971447109949]
This paper presents a convergence analysis for end-to-end deep learning of fully connected neural networks (FNN) with smooth activation functions.
The proposed method avoids any potential chattering problem, and it also does not easily lead to gradient vanishing problems.
arXiv Detail & Related papers (2023-05-26T08:09:03Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
Trained with the Logistic Loss [0.0]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks.
We analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
arXiv Detail & Related papers (2020-02-11T15:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.